From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:51571) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gp2E5-0001GD-LG for qemu-devel@nongnu.org; Wed, 30 Jan 2019 21:37:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gp2E3-0003Ol-Es for qemu-devel@nongnu.org; Wed, 30 Jan 2019 21:37:13 -0500 References: <20190117185628.21862-1-crosa@redhat.com> <20190117185628.21862-15-crosa@redhat.com> <8736pk1xs8.fsf@linaro.org> From: Cleber Rosa Message-ID: Date: Wed, 30 Jan 2019 21:37:01 -0500 MIME-Version: 1.0 In-Reply-To: <8736pk1xs8.fsf@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 14/18] Boot Linux Console Test: add a test for ppc64 + pseries List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?Alex_Benn=c3=a9e?= Cc: qemu-devel@nongnu.org, =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= , Stefan Markovic , Aleksandar Markovic , Eduardo Habkost , Caio Carrara , qemu-s390x@nongnu.org, Aurelien Jarno , Cornelia Huck , Fam Zheng , Wainer dos Santos Moschetta , Aleksandar Rikalo On 1/22/19 11:07 AM, Alex Benn=C3=A9e wrote: >=20 > Cleber Rosa writes: >=20 >> Just like the previous tests, boots a Linux kernel on a ppc64 target >> using the pseries machine. >=20 > So running this on my rather slow SynQuacer I get: >=20 > (04/16) /home/alex/lsrc/qemu.git/tests/acceptance/boot_linux_console.p= y:BootLinuxConsole.test_ppc64_pseries: INTERRUPTED: Test reported status = but did not finish\nRunner error occurred: Timeout reached\nOriginal stat= us: ERROR\n{'name': '04-/home/alex/lsrc/qemu.git/tests/acceptance/boot_li= nux_console.py:BootLinuxConsole.test_ppc64_pseries', 'logdir': '/home/ale= x/lsrc/qemu.git/te... (60.93 s) >=20 > which I'm guessing is a timeout occurring. >=20 Yes, that's what's happening. It's hard to pinpoint, and control, the sluggishness points in such a test running on a different environment. For this one execution, I do trust your assessment, and it's most certainly caused by your "slow SynQuacer", spending too much running emulation code. But, I'd like to mention that there are other possibilities. One is that you're hitting a "asset fetcher bug" that I recently fixed in Avocado[1] (fix to be available on 68.0, to be released next Monday, Feb 4th). Even with that bug fixed, I feel like it's unfair to test code to spend its time waiting to download a file when it's not testing *the file download itself*. Because of that, there are plans to add an (optional) job pre-processing step that will make sure the needed assets are in the cache ahead of time[2][3]. > I wonder if that means we should: >=20 > a) set timeouts longer for when running on TCG > or > b) split tests into TCG and KVM tests and select KVM tests on appropr= iate HW >=20 I wonder the same, and I believe this falls into a similar scenario we've seen with the setup of console devices in the QEMUMachine class. I've started by setting the device types defined at the framework level, and then reverted to the machine's default devices (using '-serial'), because the "default" behavior of QEMU is usually what a test writer wants when not setting something else explicitly. > The qemu.py code has (slightly flawed) logic for detecting KVM and > passing --enable-kvm. Maybe we should be doing that here? >=20 I'm not sure. IMO, the common question is: should we magically (at a framework level) configure tests based on probed host environment characteristics? I feel like we should attempt to minimize that for the sake of tests being more obvious and more easily reproducible. And because of that, I'd go, *initially*, with an approach more similar to your option "b". Having said that, we don't want to rewrite most tests just to be able to test with either KVM or TCG, if the tests are not explicitly testing KVM or TCG. At this point, using KVM or TCG is test/framework *configuration*, and in Avocado we hope to solve this by having the executed tests easily identifiable and reproducible (a test ID will contain a information about the options passed, and a replay of the job will apply the same configuration). For now, I think the best approach is to increase the timeout, because I think it's much worse to have to deal with false negatives (longer execution times that don't really mean a failure), than having a test possibly taking some more time to finish. And sorry for extremely the long answer! - Cleber. [1] - https://github.com/avocado-framework/avocado/pull/2996 [2] - https://trello.com/c/WPd4FrIy/1479-add-support-to-specify-assets-in-test-= docstring [3] - https://trello.com/c/CKP7YS6G/1481-on-cache-check-for-asset-fetcher