qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* tests/vm infrastructure fails to notice that QEMU dying is a failure
@ 2020-07-17 13:08 Peter Maydell
  2020-07-17 13:22 ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-07-17 13:08 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Alex Bennée

If you run vm-build-openbsd, our makefile/scripting infrastructure
seems to fail to notice that death of the QEMU process that's
running the VM should be a failure, and ends up allowing make
to return a success condition.

I have a script which runs a VM build which basically does:

  #!/bin/sh -e
  make -C "build" "vm-build-openbsd" J=8 V=1
  echo "OK DONE openbsd"

It just gave me this output (tail end of logfile). We're executing tests,
and then the qemu-system-x86_64 that's running the OpenBSD VM gets
a signal 9 (sigkill), for unclear reasons (oom killer??). The python
scripting gets an exception, but doesn't exit with a failure status to
make, which then thinks all is fine, exits success itself and allows
the set -e script to proceed to print the "OK DONE" line...


PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
{'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
DEBUG:QMP.qemu-26462:<<< {'return': {}}
WARNING:qemu.machine:qemu received signal 9; command:
"qemu-system-x86_64 -display none -vga none -chardev
socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
chardev=mon,mode=control -machine pc -chardev
socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
-serial chardev:console -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
-device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 436, in _do_shutdown
    self._soft_shutdown(has_quit, timeout)
  File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 419, in _soft_shutdown
    self._popen.wait(timeout=timeout)
  File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
'none', '-vga', 'none', '-chardev',
'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
'-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
'8', '-enable-kvm', '-drive',
'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
'-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
'-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
timed out after 3 seconds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 466, in shutdown
    self._do_shutdown(has_quit, timeout=timeout)
  File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
line 440, in _do_shutdown
    from exc
qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
OK DONE openbsd

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
  2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell
@ 2020-07-17 13:22 ` Philippe Mathieu-Daudé
  2020-07-17 13:25   ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-17 13:22 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers; +Cc: John Snow, Alex Bennée, Cleber Rosa

Cc'ing John & Cleber.

On 7/17/20 3:08 PM, Peter Maydell wrote:
> If you run vm-build-openbsd, our makefile/scripting infrastructure
> seems to fail to notice that death of the QEMU process that's
> running the VM should be a failure, and ends up allowing make
> to return a success condition.
> 
> I have a script which runs a VM build which basically does:
> 
>   #!/bin/sh -e
>   make -C "build" "vm-build-openbsd" J=8 V=1
>   echo "OK DONE openbsd"
> 
> It just gave me this output (tail end of logfile). We're executing tests,
> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
> scripting gets an exception, but doesn't exit with a failure status to
> make, which then thinks all is fine, exits success itself and allows
> the set -e script to proceed to print the "OK DONE" line...
> 
> 
> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
> DEBUG:QMP.qemu-26462:<<< {'return': {}}
> WARNING:qemu.machine:qemu received signal 9; command:
> "qemu-system-x86_64 -display none -vga none -chardev
> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
> chardev=mon,mode=control -machine pc -chardev
> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
> -device virtio-blk,drive=drive0,bootindex=0 -drive
> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
> Error in atexit._run_exitfuncs:
> Traceback (most recent call last):
>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 436, in _do_shutdown
>     self._soft_shutdown(has_quit, timeout)
>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 419, in _soft_shutdown
>     self._popen.wait(timeout=timeout)
>   File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>     raise TimeoutExpired(self.args, timeout)
> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
> 'none', '-vga', 'none', '-chardev',
> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
> '8', '-enable-kvm', '-drive',
> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
> timed out after 3 seconds
> 
> The above exception was the direct cause of the following exception:
> 
> Traceback (most recent call last):
>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 466, in shutdown
>     self._do_shutdown(has_quit, timeout=timeout)
>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
> line 440, in _do_shutdown
>     from exc
> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
> OK DONE openbsd
> 
> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
  2020-07-17 13:22 ` Philippe Mathieu-Daudé
@ 2020-07-17 13:25   ` Philippe Mathieu-Daudé
  2020-07-17 22:24     ` John Snow
  0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-07-17 13:25 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers; +Cc: John Snow, Alex Bennée, Cleber Rosa

On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
> Cc'ing John & Cleber.
> 
> On 7/17/20 3:08 PM, Peter Maydell wrote:
>> If you run vm-build-openbsd, our makefile/scripting infrastructure
>> seems to fail to notice that death of the QEMU process that's
>> running the VM should be a failure, and ends up allowing make
>> to return a success condition.
>>
>> I have a script which runs a VM build which basically does:
>>
>>   #!/bin/sh -e
>>   make -C "build" "vm-build-openbsd" J=8 V=1
>>   echo "OK DONE openbsd"
>>
>> It just gave me this output (tail end of logfile). We're executing tests,
>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
>> scripting gets an exception, but doesn't exit with a failure status to
>> make, which then thinks all is fine, exits success itself and allows
>> the set -e script to proceed to print the "OK DONE" line...
>>
>>
>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>> DEBUG:QMP.qemu-26462:<<< {'return': {}}
>> WARNING:qemu.machine:qemu received signal 9; command:
>> "qemu-system-x86_64 -display none -vga none -chardev
>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
>> chardev=mon,mode=control -machine pc -chardev
>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
>> Error in atexit._run_exitfuncs:
>> Traceback (most recent call last):
>>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 436, in _do_shutdown
>>     self._soft_shutdown(has_quit, timeout)
>>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 419, in _soft_shutdown
>>     self._popen.wait(timeout=timeout)
>>   File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>>     raise TimeoutExpired(self.args, timeout)
>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
>> 'none', '-vga', 'none', '-chardev',
>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
>> '8', '-enable-kvm', '-drive',
>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
>> timed out after 3 seconds
>>
>> The above exception was the direct cause of the following exception:
>>
>> Traceback (most recent call last):
>>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 466, in shutdown
>>     self._do_shutdown(has_quit, timeout=timeout)
>>   File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>> line 440, in _do_shutdown
>>     from exc
>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown
>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
>> OK DONE openbsd

John it seems due to the addition of the 'finally' clause in
193bf1c061 ("python/machine.py: split shutdown into hard and
soft flavors").

>>
>> thanks
>> -- PMM
>>
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
  2020-07-17 13:25   ` Philippe Mathieu-Daudé
@ 2020-07-17 22:24     ` John Snow
  2020-07-18  9:30       ` Peter Maydell
  2020-07-18 13:34       ` Robert Foley
  0 siblings, 2 replies; 7+ messages in thread
From: John Snow @ 2020-07-17 22:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Peter Maydell, QEMU Developers
  Cc: Alex Bennée, robert.foley, Cleber Rosa

On 7/17/20 9:25 AM, Philippe Mathieu-Daudé wrote:
> On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote:
>> Cc'ing John & Cleber.
>>
>> On 7/17/20 3:08 PM, Peter Maydell wrote:
>>> If you run vm-build-openbsd, our makefile/scripting infrastructure
>>> seems to fail to notice that death of the QEMU process that's
>>> running the VM should be a failure, and ends up allowing make
>>> to return a success condition.
>>>
>>> I have a script which runs a VM build which basically does:
>>>
>>>    #!/bin/sh -e
>>>    make -C "build" "vm-build-openbsd" J=8 V=1
>>>    echo "OK DONE openbsd"
>>>
>>> It just gave me this output (tail end of logfile). We're executing tests,
>>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets
>>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python
>>> scripting gets an exception, but doesn't exit with a failure status to
>>> make, which then thinks all is fine, exits success itself and allows
>>> the set -e script to proceed to print the "OK DONE" line...

The signal 9 is likely due to QEMU freezing on exit. The machine.py code 
waits about 3 seconds after trying to shut QEMU down gracefully before 
sending SIGKILL and cleaning up forcefully.

>>>
>>>
>>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx
>>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc
>>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx
>>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057,
>>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data':
>>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855,
>>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655,
>>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456,
>>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}}
>>> DEBUG:QMP.qemu-26462:<<< {'return': {}}
>>> WARNING:qemu.machine:qemu received signal 9; command:
>>> "qemu-system-x86_64 -display none -vga none -chardev
>>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon
>>> chardev=mon,mode=control -machine pc -chardev
>>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait
>>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device
>>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw
>>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1"
>>> Error in atexit._run_exitfuncs:
>>> Traceback (most recent call last):
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 436, in _do_shutdown
>>>      self._soft_shutdown(has_quit, timeout)
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 419, in _soft_shutdown
>>>      self._popen.wait(timeout=timeout)
>>>    File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
>>>      raise TimeoutExpired(self.args, timeout)

This is where it times out waiting for the process to exit of its own 
volition. This is the "Inner" exception ...

>>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display',
>>> 'none', '-vga', 'none', '-chardev',
>>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon',
>>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
>>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait',
>>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
>>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device',
>>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp',
>>> '8', '-enable-kvm', '-drive',
>>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
>>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
>>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw',
>>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']'
>>> timed out after 3 seconds
>>>
>>> The above exception was the direct cause of the following exception:
>>>
>>> Traceback (most recent call last):
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 466, in shutdown
>>>      self._do_shutdown(has_quit, timeout=timeout)
>>>    File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py",
>>> line 440, in _do_shutdown
>>>      from exc
>>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown

And this is the "Outer" exception, the machine.py cleanup handler 
letting you know that while it did perform cleanup, QEMU was not shut 
down properly.

>>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build'
>>> OK DONE openbsd
> 
> John it seems due to the addition of the 'finally' clause in
> 193bf1c061 ("python/machine.py: split shutdown into hard and
> soft flavors").
> 

That's surprising. I'll look into this Monday, but the "Finally: " 
should only perform some actions without suppressing the exception. You 
can see here it DIDN'T suppress it.

...but maybe we don't return non-zero if an exception gets raised during 
atexit()? Maybe that's what's happening here.

... ... Ah, guess what, that's exactly it. atexit suppresses exceptions. 
You just never noticed this problem before because machine.py used to 
suppress problems during shutdown instead of reporting them.

```
#!/usr/bin/env python3

import atexit
from qemu import machine

def handler():
     raise machine.AbnormalShutdown("oh no")

atexit.register(handler)
```

Oops, this returns a zero code.


So:

- basevm.py should likely be converted to a context manager object just 
like machine.py's QEMUMachine class was so cleanup can happen explicitly 
when the VM object goes out of scope.

(Callers would use: "with baseVM() as foo:" pattern, or, where that's 
not possible, call an explicit .shutdown().)

- The real problem, though: Why is QEMU hanging? It might need a longer 
timeout, or it might be having problems with the console socket again.

(CC Robert Foley who has been working on the Console socket draining 
problems. Maybe he has some insight here?)

--js



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
  2020-07-17 22:24     ` John Snow
@ 2020-07-18  9:30       ` Peter Maydell
  2020-07-20 14:16         ` John Snow
  2020-07-18 13:34       ` Robert Foley
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-07-18  9:30 UTC (permalink / raw)
  To: John Snow
  Cc: Alex Bennée, Robert Foley, Philippe Mathieu-Daudé,
	QEMU Developers, Cleber Rosa

On Fri, 17 Jul 2020 at 23:24, John Snow <jsnow@redhat.com> wrote:
> - The real problem, though: Why is QEMU hanging? It might need a longer
> timeout, or it might be having problems with the console socket again.

The host machine seemed to be under really heavy I/O load at the time.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
  2020-07-17 22:24     ` John Snow
  2020-07-18  9:30       ` Peter Maydell
@ 2020-07-18 13:34       ` Robert Foley
  1 sibling, 0 replies; 7+ messages in thread
From: Robert Foley @ 2020-07-18 13:34 UTC (permalink / raw)
  To: John Snow
  Cc: Alex Bennée, Peter Maydell, Philippe Mathieu-Daudé,
	QEMU Developers, Cleber Rosa

On Fri, 17 Jul 2020 at 18:24, John Snow <jsnow@redhat.com> wrote:
>
> - The real problem, though: Why is QEMU hanging? It might need a longer
> timeout, or it might be having problems with the console socket again.
>
> (CC Robert Foley who has been working on the Console socket draining
> problems. Maybe he has some insight here?)

When we did see the console issues we would see a hung stack like this:

#0  0x0000aaaad43d141c in qemu_chr_write_buffer
#1  0x0000aaaad43d194c in qemu_chr_write
#2  0x0000aaaad43d3968 in qemu_chr_fe_write_all
#3  0x0000aaaad417cf80 in pl011_write
#4  0x0000aaaad3f3c7b0 in memory_region_write_accessor
#5  0x0000aaaad3f3a1fc in access_with_adjusted_size
#6  0x0000aaaad3f3e828 in memory_region_dispatch_write
#7  0x0000aaaad3f517b0 in io_writex
#8  0x0000ffff574a1d34 in code_gen_buffer ()
#9  0x0000aaaad3f67228 in cpu_tb_exec
#10 0x0000aaaad3f67228 in cpu_loop_exec_tb
#11 0x0000aaaad3f67228 in cpu_exec
#12 0x0000aaaad3f2dbe4 in tcg_cpu_exec
#13 0x0000aaaad3f305e8 in qemu_tcg_cpu_thread_fn
#14 0x0000aaaad4441d88 in qemu_thread_start
#15 0x0000ffff85bec088 in start_thread
#16 0x0000ffff85b5c4ec in thread_start

However, since we added console socket draining thread, it seems to have
fixed this and presently basevm.py should be using this console draining
for the vm-build-openbsd.

When QEMU is hanging and exceeding our shutdown timeout, could we
(optionally) send something like a SIGABRT to QEMU to force a core dump
so we can get the stack and see where QEMU is hung?
I suppose that presumes it is reproducible, but it might help to remove doubt
in cases where QEMU hangs.

-Rob
>
> --js
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure
  2020-07-18  9:30       ` Peter Maydell
@ 2020-07-20 14:16         ` John Snow
  0 siblings, 0 replies; 7+ messages in thread
From: John Snow @ 2020-07-20 14:16 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, Robert Foley, Philippe Mathieu-Daudé,
	QEMU Developers, Cleber Rosa

On 7/18/20 5:30 AM, Peter Maydell wrote:
> On Fri, 17 Jul 2020 at 23:24, John Snow <jsnow@redhat.com> wrote:
>> - The real problem, though: Why is QEMU hanging? It might need a longer
>> timeout, or it might be having problems with the console socket again.
> 
> The host machine seemed to be under really heavy I/O load at the time.
> 
> thanks
> -- PMM
> 

Kevin's reporting problems too, with a 3 second wait being too 
optimistic. Let's extend the timeout a bit, which should help in that case.

(Avoiding exceptions in the atexit handler will take a little longer 
perhaps, but I will start working on that too.)

--js



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-20 14:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell
2020-07-17 13:22 ` Philippe Mathieu-Daudé
2020-07-17 13:25   ` Philippe Mathieu-Daudé
2020-07-17 22:24     ` John Snow
2020-07-18  9:30       ` Peter Maydell
2020-07-20 14:16         ` John Snow
2020-07-18 13:34       ` Robert Foley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).