* tests/vm infrastructure fails to notice that QEMU dying is a failure
@ 2020-07-17 13:08 Peter Maydell
2020-07-17 13:22 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-07-17 13:08 UTC (permalink / raw)
To: QEMU Developers; +Cc: Alex Bennée
If you run vm-build-openbsd, our makefile/scripting infrastructure
seems to fail to notice that death of the QEMU process that's
running the VM should be a failure, and ends up allowing make
to return a success condition.
I have a script which runs a VM build which basically does:
#!/bin/sh -e
make -C "build" "vm-build-openbsd" J=8 V=1
echo "OK DONE openbsd"
It just gave me this output (tail end of logfile). We're executing tests,
and then the qemu-system-x86_64 that's running the OpenBSD VM gets
a signal 9 (sigkill), for unclear reasons (oom killer??). The python
scripting gets an exception, but doesn't exit with a failure status to
make, which then thinks all is fine, exits success itself and allows
the set -e script to proceed to print the "OK DONE" line...
PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
{'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'return': {}}
WARNING:qemu.machine:qemu received signal 9; command:
"qemu-system-x86_64 -display none -vga none -chardev
socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
chardev=mon,mode=control -machine pc -chardev
socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
-serial chardev:console -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
-device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 436, in _do_shutdown
self._soft_shutdown(has_quit, timeout)
File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 419, in _soft_shutdown
self._popen.wait(timeout=timeout)
File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
'none', '-vga', 'none', '-chardev',
'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
'-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
'8', '-enable-kvm', '-drive',
'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
'-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
'-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
timed out after 3 seconds
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 466, in shutdown
self._do_shutdown(has_quit, timeout=timeout)
File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 440, in _do_shutdown
from exc
qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
OK DONE openbsd
thanks
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell
@ 2020-07-17 13:22 ` Philippe Mathieu-Daudé
2020-07-17 13:25 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-17 13:22 UTC (permalink / raw)
To: Peter Maydell, QEMU Developers; +Cc: John Snow, Alex Bennée, Cleber Rosa
Cc'ing John & Cleber.
On 7/17/20 3:08 PM, Peter Maydell wrote:
> If you run vm-build-openbsd, our makefile/scripting infrastructure
> seems to fail to notice that death of the QEMU process that's
> running the VM should be a failure, and ends up allowing make
> to return a success condition.
>
> I have a script which runs a VM build which basically does:
>
> #!/bin/sh -e
> make -C "build" "vm-build-openbsd" J=8 V=1
> echo "OK DONE openbsd"
>
> It just gave me this output (tail end of logfile). We're executing tests,
> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
> scripting gets an exception, but doesn't exit with a failure status to
> make, which then thinks all is fine, exits success itself and allows
> the set -e script to proceed to print the "OK DONE" line...
>
>
> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
> DEBUG:QMP.qemu-26462:<<< {'return': {}}
> WARNING:qemu.machine:qemu received signal 9; command:
> "qemu-system-x86_64 -display none -vga none -chardev
> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
> chardev=mon,mode=control -machine pc -chardev
> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
> -device virtio-blk,drive=drive0,bootindex=0 -drive
> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 436, in _do_shutdown
> self._soft_shutdown(has_quit, timeout)
> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 419, in _soft_shutdown
> self._popen.wait(timeout=timeout)
> File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
> raise TimeoutExpired(self.args, timeout)
> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
> 'none', '-vga', 'none', '-chardev',
> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
> '8', '-enable-kvm', '-drive',
> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
> timed out after 3 seconds
>
> The above exception was the direct cause of the following exception:
>
> Traceback (most recent call last):
> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 466, in shutdown
> self._do_shutdown(has_quit, timeout=timeout)
> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 440, in _do_shutdown
> from exc
> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
> OK DONE openbsd
>
> thanks
> -- PMM
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
2020-07-17 13:22 ` Philippe Mathieu-Daudé
@ 2020-07-17 13:25 ` Philippe Mathieu-Daudé
2020-07-17 22:24 ` John Snow
0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-17 13:25 UTC (permalink / raw)
To: Peter Maydell, QEMU Developers; +Cc: John Snow, Alex Bennée, Cleber Rosa
On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
> Cc'ing John & Cleber.
>
> On 7/17/20 3:08 PM, Peter Maydell wrote:
>> If you run vm-build-openbsd, our makefile/scripting infrastructure
>> seems to fail to notice that death of the QEMU process that's
>> running the VM should be a failure, and ends up allowing make
>> to return a success condition.
>>
>> I have a script which runs a VM build which basically does:
>>
>> #!/bin/sh -e
>> make -C "build" "vm-build-openbsd" J=8 V=1
>> echo "OK DONE openbsd"
>>
>> It just gave me this output (tail end of logfile). We're executing tests,
>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
>> scripting gets an exception, but doesn't exit with a failure status to
>> make, which then thinks all is fine, exits success itself and allows
>> the set -e script to proceed to print the "OK DONE" line...
>>
>>
>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>> DEBUG:QMP.qemu-26462:<<< {'return': {}}
>> WARNING:qemu.machine:qemu received signal 9; command:
>> "qemu-system-x86_64 -display none -vga none -chardev
>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
>> chardev=mon,mode=control -machine pc -chardev
>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
>> Error in atexit._run_exitfuncs:
>> Traceback (most recent call last):
>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 436, in _do_shutdown
>> self._soft_shutdown(has_quit, timeout)
>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 419, in _soft_shutdown
>> self._popen.wait(timeout=timeout)
>> File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>> raise TimeoutExpired(self.args, timeout)
>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
>> 'none', '-vga', 'none', '-chardev',
>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
>> '8', '-enable-kvm', '-drive',
>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
>> timed out after 3 seconds
>>
>> The above exception was the direct cause of the following exception:
>>
>> Traceback (most recent call last):
>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 466, in shutdown
>> self._do_shutdown(has_quit, timeout=timeout)
>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 440, in _do_shutdown
>> from exc
>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
>> OK DONE openbsd
John it seems due to the addition of the 'finally' clause in
193bf1c061 ("python/machine.py: split shutdown into hard and
soft flavors").
>>
>> thanks
>> -- PMM
>>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
2020-07-17 13:25 ` Philippe Mathieu-Daudé
@ 2020-07-17 22:24 ` John Snow
2020-07-18 9:30 ` Peter Maydell
2020-07-18 13:34 ` Robert Foley
0 siblings, 2 replies; 7+ messages in thread
From: John Snow @ 2020-07-17 22:24 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Peter Maydell, QEMU Developers
Cc: Alex Bennée, robert.foley, Cleber Rosa
On 7/17/20 9:25 AM, Philippe Mathieu-Daudé wrote:
> On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
>> Cc'ing John & Cleber.
>>
>> On 7/17/20 3:08 PM, Peter Maydell wrote:
>>> If you run vm-build-openbsd, our makefile/scripting infrastructure
>>> seems to fail to notice that death of the QEMU process that's
>>> running the VM should be a failure, and ends up allowing make
>>> to return a success condition.
>>>
>>> I have a script which runs a VM build which basically does:
>>>
>>> #!/bin/sh -e
>>> make -C "build" "vm-build-openbsd" J=8 V=1
>>> echo "OK DONE openbsd"
>>>
>>> It just gave me this output (tail end of logfile). We're executing tests,
>>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
>>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
>>> scripting gets an exception, but doesn't exit with a failure status to
>>> make, which then thinks all is fine, exits success itself and allows
>>> the set -e script to proceed to print the "OK DONE" line...
The signal 9 is likely due to QEMU freezing on exit. The machine.py code
waits about 3 seconds after trying to shut QEMU down gracefully before
sending SIGKILL and cleaning up forcefully.
>>>
>>>
>>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
>>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
>>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
>>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
>>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
>>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
>>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
>>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
>>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'return': {}}
>>> WARNING:qemu.machine:qemu received signal 9; command:
>>> "qemu-system-x86_64 -display none -vga none -chardev
>>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
>>> chardev=mon,mode=control -machine pc -chardev
>>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
>>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
>>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
>>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
>>> Error in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 436, in _do_shutdown
>>> self._soft_shutdown(has_quit, timeout)
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 419, in _soft_shutdown
>>> self._popen.wait(timeout=timeout)
>>> File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>>> raise TimeoutExpired(self.args, timeout)
This is where it times out waiting for the process to exit of its own
volition. This is the "Inner" exception ...
>>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
>>> 'none', '-vga', 'none', '-chardev',
>>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
>>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
>>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
>>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
>>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
>>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
>>> '8', '-enable-kvm', '-drive',
>>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
>>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
>>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
>>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
>>> timed out after 3 seconds
>>>
>>> The above exception was the direct cause of the following exception:
>>>
>>> Traceback (most recent call last):
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 466, in shutdown
>>> self._do_shutdown(has_quit, timeout=timeout)
>>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 440, in _do_shutdown
>>> from exc
>>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
And this is the "Outer" exception, the machine.py cleanup handler
letting you know that while it did perform cleanup, QEMU was not shut
down properly.
>>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
>>> OK DONE openbsd
>
> John it seems due to the addition of the 'finally' clause in
> 193bf1c061 ("python/machine.py: split shutdown into hard and
> soft flavors").
>
That's surprising. I'll look into this Monday, but the "Finally: "
should only perform some actions without suppressing the exception. You
can see here it DIDN'T suppress it.
...but maybe we don't return non-zero if an exception gets raised during
atexit()? Maybe that's what's happening here.
... ... Ah, guess what, that's exactly it. atexit suppresses exceptions.
You just never noticed this problem before because machine.py used to
suppress problems during shutdown instead of reporting them.
```
#!/usr/bin/env python3
import atexit
from qemu import machine
def handler():
raise machine.AbnormalShutdown("oh no")
atexit.register(handler)
```
Oops, this returns a zero code.
So:
- basevm.py should likely be converted to a context manager object just
like machine.py's QEMUMachine class was so cleanup can happen explicitly
when the VM object goes out of scope.
(Callers would use: "with baseVM() as foo:" pattern, or, where that's
not possible, call an explicit .shutdown().)
- The real problem, though: Why is QEMU hanging? It might need a longer
timeout, or it might be having problems with the console socket again.
(CC Robert Foley who has been working on the Console socket draining
problems. Maybe he has some insight here?)
--js
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
2020-07-17 22:24 ` John Snow
@ 2020-07-18 9:30 ` Peter Maydell
2020-07-20 14:16 ` John Snow
2020-07-18 13:34 ` Robert Foley
1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-07-18 9:30 UTC (permalink / raw)
To: John Snow
Cc: Alex Bennée, Robert Foley, Philippe Mathieu-Daudé,
QEMU Developers, Cleber Rosa
On Fri, 17 Jul 2020 at 23:24, John Snow <jsnow@redhat.com> wrote:
> - The real problem, though: Why is QEMU hanging? It might need a longer
> timeout, or it might be having problems with the console socket again.
The host machine seemed to be under really heavy I/O load at the time.
thanks
-- PMM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
2020-07-17 22:24 ` John Snow
2020-07-18 9:30 ` Peter Maydell
@ 2020-07-18 13:34 ` Robert Foley
1 sibling, 0 replies; 7+ messages in thread
From: Robert Foley @ 2020-07-18 13:34 UTC (permalink / raw)
To: John Snow
Cc: Alex Bennée, Peter Maydell, Philippe Mathieu-Daudé,
QEMU Developers, Cleber Rosa
On Fri, 17 Jul 2020 at 18:24, John Snow <jsnow@redhat.com> wrote:
>
> - The real problem, though: Why is QEMU hanging? It might need a longer
> timeout, or it might be having problems with the console socket again.
>
> (CC Robert Foley who has been working on the Console socket draining
> problems. Maybe he has some insight here?)
When we did see the console issues we would see a hung stack like this:
#0 0x0000aaaad43d141c in qemu_chr_write_buffer
#1 0x0000aaaad43d194c in qemu_chr_write
#2 0x0000aaaad43d3968 in qemu_chr_fe_write_all
#3 0x0000aaaad417cf80 in pl011_write
#4 0x0000aaaad3f3c7b0 in memory_region_write_accessor
#5 0x0000aaaad3f3a1fc in access_with_adjusted_size
#6 0x0000aaaad3f3e828 in memory_region_dispatch_write
#7 0x0000aaaad3f517b0 in io_writex
#8 0x0000ffff574a1d34 in code_gen_buffer ()
#9 0x0000aaaad3f67228 in cpu_tb_exec
#10 0x0000aaaad3f67228 in cpu_loop_exec_tb
#11 0x0000aaaad3f67228 in cpu_exec
#12 0x0000aaaad3f2dbe4 in tcg_cpu_exec
#13 0x0000aaaad3f305e8 in qemu_tcg_cpu_thread_fn
#14 0x0000aaaad4441d88 in qemu_thread_start
#15 0x0000ffff85bec088 in start_thread
#16 0x0000ffff85b5c4ec in thread_start
However, since we added console socket draining thread, it seems to have
fixed this and presently basevm.py should be using this console draining
for the vm-build-openbsd.
When QEMU is hanging and exceeding our shutdown timeout, could we
(optionally) send something like a SIGABRT to QEMU to force a core dump
so we can get the stack and see where QEMU is hung?
I suppose that presumes it is reproducible, but it might help to remove doubt
in cases where QEMU hangs.
-Rob
>
> --js
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
2020-07-18 9:30 ` Peter Maydell
@ 2020-07-20 14:16 ` John Snow
0 siblings, 0 replies; 7+ messages in thread
From: John Snow @ 2020-07-20 14:16 UTC (permalink / raw)
To: Peter Maydell
Cc: Alex Bennée, Robert Foley, Philippe Mathieu-Daudé,
QEMU Developers, Cleber Rosa
On 7/18/20 5:30 AM, Peter Maydell wrote:
> On Fri, 17 Jul 2020 at 23:24, John Snow <jsnow@redhat.com> wrote:
>> - The real problem, though: Why is QEMU hanging? It might need a longer
>> timeout, or it might be having problems with the console socket again.
>
> The host machine seemed to be under really heavy I/O load at the time.
>
> thanks
> -- PMM
>
Kevin's reporting problems too, with a 3 second wait being too
optimistic. Let's extend the timeout a bit, which should help in that case.
(Avoiding exceptions in the atexit handler will take a little longer
perhaps, but I will start working on that too.)
--js
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-07-20 14:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell
2020-07-17 13:22 ` Philippe Mathieu-Daudé
2020-07-17 13:25 ` Philippe Mathieu-Daudé
2020-07-17 22:24 ` John Snow
2020-07-18 9:30 ` Peter Maydell
2020-07-20 14:16 ` John Snow
2020-07-18 13:34 ` Robert Foley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).