* tests/vm infrastructure fails to notice that QEMU dying is a failure @ 2020-07-17 13:08 Peter Maydell 2020-07-17 13:22 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 7+ messages in thread From: Peter Maydell @ 2020-07-17 13:08 UTC (permalink / raw) To: QEMU Developers; +Cc: Alex Bennée If you run vm-build-openbsd, our makefile/scripting infrastructure seems to fail to notice that death of the QEMU process that's running the VM should be a failure, and ends up allowing make to return a success condition. I have a script which runs a VM build which basically does: #!/bin/sh -e make -C "build" "vm-build-openbsd" J=8 V=1 echo "OK DONE openbsd" It just gave me this output (tail end of logfile). We're executing tests, and then the qemu-system-x86_64 that's running the OpenBSD VM gets a signal 9 (sigkill), for unclear reasons (oom killer??). The python scripting gets an exception, but doesn't exit with a failure status to make, which then thinks all is fine, exits success itself and allows the set -e script to proceed to print the "OK DONE" line... PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'} DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057, 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data': {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}} DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855, 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655, 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456, 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} DEBUG:QMP.qemu-26462:<<< {'return': {}} WARNING:qemu.machine:qemu received signal 9; command: "qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon chardev=mon,mode=control -machine pc -chardev socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait -serial chardev:console -nodefaults -m 4G -cpu max -netdev user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback -device virtio-blk,drive=drive0,bootindex=0 -drive file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1" Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", line 436, in _do_shutdown self._soft_shutdown(has_quit, timeout) File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", line 419, in _soft_shutdown self._popen.wait(timeout=timeout) File "/usr/lib/python3.6/subprocess.py", line 1469, in wait raise TimeoutExpired(self.args, timeout) subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display', 'none', '-vga', 'none', '-chardev', 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon', 'chardev=mon,mode=control', '-machine', 'pc', '-chardev', 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait', '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu', 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device', 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp', '8', '-enable-kvm', '-drive', 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback', '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive', 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw', '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']' timed out after 3 seconds The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", line 466, in shutdown self._do_shutdown(has_quit, timeout=timeout) File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", line 440, in _do_shutdown from exc qemu.machine.AbnormalShutdown: Could not perform graceful shutdown make: Leaving directory '/home/peter.maydell/qemu-openbsd/build' OK DONE openbsd thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure 2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell @ 2020-07-17 13:22 ` Philippe Mathieu-Daudé 2020-07-17 13:25 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 7+ messages in thread From: Philippe Mathieu-Daudé @ 2020-07-17 13:22 UTC (permalink / raw) To: Peter Maydell, QEMU Developers; +Cc: John Snow, Alex Bennée, Cleber Rosa Cc'ing John & Cleber. On 7/17/20 3:08 PM, Peter Maydell wrote: > If you run vm-build-openbsd, our makefile/scripting infrastructure > seems to fail to notice that death of the QEMU process that's > running the VM should be a failure, and ends up allowing make > to return a success condition. > > I have a script which runs a VM build which basically does: > > #!/bin/sh -e > make -C "build" "vm-build-openbsd" J=8 V=1 > echo "OK DONE openbsd" > > It just gave me this output (tail end of logfile). We're executing tests, > and then the qemu-system-x86_64 that's running the OpenBSD VM gets > a signal 9 (sigkill), for unclear reasons (oom killer??). The python > scripting gets an exception, but doesn't exit with a failure status to > make, which then thinks all is fine, exits success itself and allows > the set -e script to proceed to print the "OK DONE" line... > > > PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx > PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc > PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx > DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'} > DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057, > 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data': > {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}} > DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855, > 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} > DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655, > 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} > DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456, > 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} > DEBUG:QMP.qemu-26462:<<< {'return': {}} > WARNING:qemu.machine:qemu received signal 9; command: > "qemu-system-x86_64 -display none -vga none -chardev > socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon > chardev=mon,mode=control -machine pc -chardev > socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait > -serial chardev:console -nodefaults -m 4G -cpu max -netdev > user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device > virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm > -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback > -device virtio-blk,drive=drive0,bootindex=0 -drive > file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw > -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1" > Error in atexit._run_exitfuncs: > Traceback (most recent call last): > File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", > line 436, in _do_shutdown > self._soft_shutdown(has_quit, timeout) > File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", > line 419, in _soft_shutdown > self._popen.wait(timeout=timeout) > File "/usr/lib/python3.6/subprocess.py", line 1469, in wait > raise TimeoutExpired(self.args, timeout) > subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display', > 'none', '-vga', 'none', '-chardev', > 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon', > 'chardev=mon,mode=control', '-machine', 'pc', '-chardev', > 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait', > '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu', > 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device', > 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp', > '8', '-enable-kvm', '-drive', > 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback', > '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive', > 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw', > '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']' > timed out after 3 seconds > > The above exception was the direct cause of the following exception: > > Traceback (most recent call last): > File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", > line 466, in shutdown > self._do_shutdown(has_quit, timeout=timeout) > File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", > line 440, in _do_shutdown > from exc > qemu.machine.AbnormalShutdown: Could not perform graceful shutdown > make: Leaving directory '/home/peter.maydell/qemu-openbsd/build' > OK DONE openbsd > > thanks > -- PMM > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure 2020-07-17 13:22 ` Philippe Mathieu-Daudé @ 2020-07-17 13:25 ` Philippe Mathieu-Daudé 2020-07-17 22:24 ` John Snow 0 siblings, 1 reply; 7+ messages in thread From: Philippe Mathieu-Daudé @ 2020-07-17 13:25 UTC (permalink / raw) To: Peter Maydell, QEMU Developers; +Cc: John Snow, Alex Bennée, Cleber Rosa On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote: > Cc'ing John & Cleber. > > On 7/17/20 3:08 PM, Peter Maydell wrote: >> If you run vm-build-openbsd, our makefile/scripting infrastructure >> seems to fail to notice that death of the QEMU process that's >> running the VM should be a failure, and ends up allowing make >> to return a success condition. >> >> I have a script which runs a VM build which basically does: >> >> #!/bin/sh -e >> make -C "build" "vm-build-openbsd" J=8 V=1 >> echo "OK DONE openbsd" >> >> It just gave me this output (tail end of logfile). We're executing tests, >> and then the qemu-system-x86_64 that's running the OpenBSD VM gets >> a signal 9 (sigkill), for unclear reasons (oom killer??). The python >> scripting gets an exception, but doesn't exit with a failure status to >> make, which then thinks all is fine, exits success itself and allows >> the set -e script to proceed to print the "OK DONE" line... >> >> >> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx >> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc >> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx >> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'} >> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057, >> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data': >> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}} >> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855, >> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} >> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655, >> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} >> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456, >> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} >> DEBUG:QMP.qemu-26462:<<< {'return': {}} >> WARNING:qemu.machine:qemu received signal 9; command: >> "qemu-system-x86_64 -display none -vga none -chardev >> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon >> chardev=mon,mode=control -machine pc -chardev >> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait >> -serial chardev:console -nodefaults -m 4G -cpu max -netdev >> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device >> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm >> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback >> -device virtio-blk,drive=drive0,bootindex=0 -drive >> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw >> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1" >> Error in atexit._run_exitfuncs: >> Traceback (most recent call last): >> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >> line 436, in _do_shutdown >> self._soft_shutdown(has_quit, timeout) >> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >> line 419, in _soft_shutdown >> self._popen.wait(timeout=timeout) >> File "/usr/lib/python3.6/subprocess.py", line 1469, in wait >> raise TimeoutExpired(self.args, timeout) >> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display', >> 'none', '-vga', 'none', '-chardev', >> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon', >> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev', >> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait', >> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu', >> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device', >> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp', >> '8', '-enable-kvm', '-drive', >> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback', >> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive', >> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw', >> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']' >> timed out after 3 seconds >> >> The above exception was the direct cause of the following exception: >> >> Traceback (most recent call last): >> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >> line 466, in shutdown >> self._do_shutdown(has_quit, timeout=timeout) >> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >> line 440, in _do_shutdown >> from exc >> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown >> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build' >> OK DONE openbsd John it seems due to the addition of the 'finally' clause in 193bf1c061 ("python/machine.py: split shutdown into hard and soft flavors"). >> >> thanks >> -- PMM >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure 2020-07-17 13:25 ` Philippe Mathieu-Daudé @ 2020-07-17 22:24 ` John Snow 2020-07-18 9:30 ` Peter Maydell 2020-07-18 13:34 ` Robert Foley 0 siblings, 2 replies; 7+ messages in thread From: John Snow @ 2020-07-17 22:24 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Peter Maydell, QEMU Developers Cc: Alex Bennée, robert.foley, Cleber Rosa On 7/17/20 9:25 AM, Philippe Mathieu-Daudé wrote: > On 7/17/20 3:22 PM, Philippe Mathieu-Daudé wrote: >> Cc'ing John & Cleber. >> >> On 7/17/20 3:08 PM, Peter Maydell wrote: >>> If you run vm-build-openbsd, our makefile/scripting infrastructure >>> seems to fail to notice that death of the QEMU process that's >>> running the VM should be a failure, and ends up allowing make >>> to return a success condition. >>> >>> I have a script which runs a VM build which basically does: >>> >>> #!/bin/sh -e >>> make -C "build" "vm-build-openbsd" J=8 V=1 >>> echo "OK DONE openbsd" >>> >>> It just gave me this output (tail end of logfile). We're executing tests, >>> and then the qemu-system-x86_64 that's running the OpenBSD VM gets >>> a signal 9 (sigkill), for unclear reasons (oom killer??). The python >>> scripting gets an exception, but doesn't exit with a failure status to >>> make, which then thinks all is fine, exits success itself and allows >>> the set -e script to proceed to print the "OK DONE" line... The signal 9 is likely due to QEMU freezing on exit. The machine.py code waits about 3 seconds after trying to shut QEMU down gracefully before sending SIGKILL and cleaning up forcefully. >>> >>> >>> PASS 30 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/tx-rx >>> PASS 31 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/pca9552/pca9552-tests/rx-autoinc >>> PASS 32 qos-test /arm/imx25-pdk/imx.i2c/i2c-bus/ds1338/ds1338-tests/tx-rx >>> DEBUG:QMP.qemu-26462:>>> {'execute': 'quit'} >>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594984057, >>> 'microseconds': 485197}, 'event': 'NIC_RX_FILTER_CHANGED', 'data': >>> {'path': '/machine/peripheral-anon/device[0]/virtio-backend'}} >>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594985855, >>> 'microseconds': 169552}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} >>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594987655, >>> 'microseconds': 169187}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} >>> DEBUG:QMP.qemu-26462:<<< {'timestamp': {'seconds': 1594989456, >>> 'microseconds': 88866}, 'event': 'RTC_CHANGE', 'data': {'offset': 0}} >>> DEBUG:QMP.qemu-26462:<<< {'return': {}} >>> WARNING:qemu.machine:qemu received signal 9; command: >>> "qemu-system-x86_64 -display none -vga none -chardev >>> socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock -mon >>> chardev=mon,mode=control -machine pc -chardev >>> socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait >>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev >>> user,id=vnet,hostfwd=:127.0.0.1:0-:22 -device >>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm >>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback >>> -device virtio-blk,drive=drive0,bootindex=0 -drive >>> file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw >>> -device virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1" >>> Error in atexit._run_exitfuncs: >>> Traceback (most recent call last): >>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >>> line 436, in _do_shutdown >>> self._soft_shutdown(has_quit, timeout) >>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >>> line 419, in _soft_shutdown >>> self._popen.wait(timeout=timeout) >>> File "/usr/lib/python3.6/subprocess.py", line 1469, in wait >>> raise TimeoutExpired(self.args, timeout) This is where it times out waiting for the process to exit of its own volition. This is the "Inner" exception ... >>> subprocess.TimeoutExpired: Command '['qemu-system-x86_64', '-display', >>> 'none', '-vga', 'none', '-chardev', >>> 'socket,id=mon,path=/var/tmp/qemu-26462-monitor.sock', '-mon', >>> 'chardev=mon,mode=control', '-machine', 'pc', '-chardev', >>> 'socket,id=console,path=/var/tmp/qemu-26462-console.sock,server,nowait', >>> '-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu', >>> 'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22', '-device', >>> 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20', '-smp', >>> '8', '-enable-kvm', '-drive', >>> 'file=/home/peter.maydell/.cache/qemu-vm/images/openbsd.img,snapshot=on,if=none,id=drive0,cache=writeback', >>> '-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive', >>> 'file=/home/peter.maydell/qemu-openbsd/build/vm-test-yzwn6xdc.tmp/data-993a1.tar,if=none,id=data-993a1,cache=writeback,format=raw', >>> '-device', 'virtio-blk,drive=data-993a1,serial=data-993a1,bootindex=1']' >>> timed out after 3 seconds >>> >>> The above exception was the direct cause of the following exception: >>> >>> Traceback (most recent call last): >>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >>> line 466, in shutdown >>> self._do_shutdown(has_quit, timeout=timeout) >>> File "/home/peter.maydell/qemu-openbsd/tests/vm/../../python/qemu/machine.py", >>> line 440, in _do_shutdown >>> from exc >>> qemu.machine.AbnormalShutdown: Could not perform graceful shutdown And this is the "Outer" exception, the machine.py cleanup handler letting you know that while it did perform cleanup, QEMU was not shut down properly. >>> make: Leaving directory '/home/peter.maydell/qemu-openbsd/build' >>> OK DONE openbsd > > John it seems due to the addition of the 'finally' clause in > 193bf1c061 ("python/machine.py: split shutdown into hard and > soft flavors"). > That's surprising. I'll look into this Monday, but the "Finally: " should only perform some actions without suppressing the exception. You can see here it DIDN'T suppress it. ...but maybe we don't return non-zero if an exception gets raised during atexit()? Maybe that's what's happening here. ... ... Ah, guess what, that's exactly it. atexit suppresses exceptions. You just never noticed this problem before because machine.py used to suppress problems during shutdown instead of reporting them. ``` #!/usr/bin/env python3 import atexit from qemu import machine def handler(): raise machine.AbnormalShutdown("oh no") atexit.register(handler) ``` Oops, this returns a zero code. So: - basevm.py should likely be converted to a context manager object just like machine.py's QEMUMachine class was so cleanup can happen explicitly when the VM object goes out of scope. (Callers would use: "with baseVM() as foo:" pattern, or, where that's not possible, call an explicit .shutdown().) - The real problem, though: Why is QEMU hanging? It might need a longer timeout, or it might be having problems with the console socket again. (CC Robert Foley who has been working on the Console socket draining problems. Maybe he has some insight here?) --js ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure 2020-07-17 22:24 ` John Snow @ 2020-07-18 9:30 ` Peter Maydell 2020-07-20 14:16 ` John Snow 2020-07-18 13:34 ` Robert Foley 1 sibling, 1 reply; 7+ messages in thread From: Peter Maydell @ 2020-07-18 9:30 UTC (permalink / raw) To: John Snow Cc: Alex Bennée, Robert Foley, Philippe Mathieu-Daudé, QEMU Developers, Cleber Rosa On Fri, 17 Jul 2020 at 23:24, John Snow <jsnow@redhat.com> wrote: > - The real problem, though: Why is QEMU hanging? It might need a longer > timeout, or it might be having problems with the console socket again. The host machine seemed to be under really heavy I/O load at the time. thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure 2020-07-18 9:30 ` Peter Maydell @ 2020-07-20 14:16 ` John Snow 0 siblings, 0 replies; 7+ messages in thread From: John Snow @ 2020-07-20 14:16 UTC (permalink / raw) To: Peter Maydell Cc: Alex Bennée, Robert Foley, Philippe Mathieu-Daudé, QEMU Developers, Cleber Rosa On 7/18/20 5:30 AM, Peter Maydell wrote: > On Fri, 17 Jul 2020 at 23:24, John Snow <jsnow@redhat.com> wrote: >> - The real problem, though: Why is QEMU hanging? It might need a longer >> timeout, or it might be having problems with the console socket again. > > The host machine seemed to be under really heavy I/O load at the time. > > thanks > -- PMM > Kevin's reporting problems too, with a 3 second wait being too optimistic. Let's extend the timeout a bit, which should help in that case. (Avoiding exceptions in the atexit handler will take a little longer perhaps, but I will start working on that too.) --js ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: tests/vm infrastructure fails to notice that QEMU dying is a failure 2020-07-17 22:24 ` John Snow 2020-07-18 9:30 ` Peter Maydell @ 2020-07-18 13:34 ` Robert Foley 1 sibling, 0 replies; 7+ messages in thread From: Robert Foley @ 2020-07-18 13:34 UTC (permalink / raw) To: John Snow Cc: Alex Bennée, Peter Maydell, Philippe Mathieu-Daudé, QEMU Developers, Cleber Rosa On Fri, 17 Jul 2020 at 18:24, John Snow <jsnow@redhat.com> wrote: > > - The real problem, though: Why is QEMU hanging? It might need a longer > timeout, or it might be having problems with the console socket again. > > (CC Robert Foley who has been working on the Console socket draining > problems. Maybe he has some insight here?) When we did see the console issues we would see a hung stack like this: #0 0x0000aaaad43d141c in qemu_chr_write_buffer #1 0x0000aaaad43d194c in qemu_chr_write #2 0x0000aaaad43d3968 in qemu_chr_fe_write_all #3 0x0000aaaad417cf80 in pl011_write #4 0x0000aaaad3f3c7b0 in memory_region_write_accessor #5 0x0000aaaad3f3a1fc in access_with_adjusted_size #6 0x0000aaaad3f3e828 in memory_region_dispatch_write #7 0x0000aaaad3f517b0 in io_writex #8 0x0000ffff574a1d34 in code_gen_buffer () #9 0x0000aaaad3f67228 in cpu_tb_exec #10 0x0000aaaad3f67228 in cpu_loop_exec_tb #11 0x0000aaaad3f67228 in cpu_exec #12 0x0000aaaad3f2dbe4 in tcg_cpu_exec #13 0x0000aaaad3f305e8 in qemu_tcg_cpu_thread_fn #14 0x0000aaaad4441d88 in qemu_thread_start #15 0x0000ffff85bec088 in start_thread #16 0x0000ffff85b5c4ec in thread_start However, since we added console socket draining thread, it seems to have fixed this and presently basevm.py should be using this console draining for the vm-build-openbsd. When QEMU is hanging and exceeding our shutdown timeout, could we (optionally) send something like a SIGABRT to QEMU to force a core dump so we can get the stack and see where QEMU is hung? I suppose that presumes it is reproducible, but it might help to remove doubt in cases where QEMU hangs. -Rob > > --js > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-07-20 14:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-07-17 13:08 tests/vm infrastructure fails to notice that QEMU dying is a failure Peter Maydell 2020-07-17 13:22 ` Philippe Mathieu-Daudé 2020-07-17 13:25 ` Philippe Mathieu-Daudé 2020-07-17 22:24 ` John Snow 2020-07-18 9:30 ` Peter Maydell 2020-07-20 14:16 ` John Snow 2020-07-18 13:34 ` Robert Foley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).