From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ian Jackson <ian.jackson@eu.citrix.com>, Wei Liu <wei.liu2@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
Date: Fri, 29 Jul 2016 16:05:15 +0100 [thread overview]
Message-ID: <4f21dee1-7a7a-cf7c-70a8-f6cff1a9d65c@citrix.com> (raw)
In-Reply-To: <22427.26818.981068.78463@mariner.uk.xensource.com>
On 29/07/16 15:31, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
>> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
>>> The runner existing before xl has torn down the guest is very
>>> deliberate, because some part of hvm guests is terribly slow to tear
>>> down; waiting synchronously for teardown tripled the wallclock time to
>>> run a load of tests back-to-back.
>> Then you won't know if a guest is leaked or it is being slowly destroyed
>> when a dead guest shows up in the snapshot of 'xl list'.
>>
>> Also consider that would make back-to-back tests that happen to have a
>> guest that has the same name as the one in previous test fail.
>>
>> I don't think getting blocked for a few more seconds is a big issue.
>> It's is important to eliminate such race conditions so that osstest can
>> work properly.
> IMO the biggest reason for waiting for teardown is that that will make
> it possible to accurately identify the xtf test which was responsible
> for the failure if a test reveals a bug which causes problems for the
> whole host.
That is perfectly reasonable.
>
> Suppose there is a test T1 which, in buggy hypervisors, creates an
> anomalous data structure, such that the hypervisor crashes when T1's
> guest is finally torn down.
>
> If we start to run the next test T2 immediately we see success output
> from T1, we will observe the host crashing "due to T2", and T1 would
> be regarded as having succeeded.
>
> This is why in an in-person conversation with Wei yesterday I
> recommended that osstest should after each xtf test (i) wait for
> everything to be torn down and (ii) then check that the dom0 is still
> up. (And these two activities are regarded as part of the preceding
> test step.)
That is also my understanding of how the intended OSSTest integration is
going to work.
OSSTest asks `./xtf-runner --list` for all tests, then iterates over all
tests, running them one at a time, with suitable liveness checks
inbetween. This is not using xtf-runner's ability to run multiple tests
back to back.
The dev usecase on the other hand is for something like, for checking a
test case refactoring or new bit of functionality.
$ ./xtf-runner sefltest
<snip>
Combined test results:
test-pv64-selftest SUCCESS
test-pv32pae-selftest SUCCESS
test-hvm64-selftest SUCCESS
test-hvm32pae-selftest SUCCESS
test-hvm32pse-selftest SUCCESS
test-hvm32-selftest SUCCESS
FWIW, I have just put a synchronous wait in to demonstrate.
Without wait:
$ time ./xtf-runner sefltest
<snip>
real 0m0.571s
user 0m0.060s
sys 0m0.228s
With wait:
$ time ./xtf-runner sefltest
<snip>
real 0m8.870s
user 0m0.048s
sys 0m0.280s
That is more than 8 wallclock seconds elapsed where nothing useful is
happening from the point of view of a human using ./xtf-runner. All of
this time is spent between @releaseDomain and `xl create -F` finally
exiting.
>
> If this leads to over-consumption of machine resources because this
> serialisation is too slow then the right approach would be explicit
> parallelisation in osstest. That would still mean that in the
> scenario above, T1 would be regarded as having failed, because T1
> wouldn't be regarded as having passed until osstest had seen that all
> of T1's cleanup had been done and the host was still up. (T2 would
> _also_ be regarded as failed, and that might look like a heisenbug,
> but that would be tolerable.)
OSSTest shouldn't run multiple tests at once, and I have taken exactly
the same decision for XenRT. Easy identification of what went bang is
the most important properly in these cases.
We are going to have to get to a vast test library before the wallclock
time of XTF tests approaches anything similar to installing a VM from
scratch. I am not worried at the moment.
>
> Wei: I need to check what happens with multiple failing test steps in
> the same job. Specifically, I need to check which one the bisector
> is likely to try to attack.
For individual XTF tests, it is entirely possible that every failure is
from a different change, so should be treated individually.
Having said that, it is also quite likely that, given a lot of similar
microckernels, one hypervisor bug would take a large number out at once,
and we really don't want to bisect each individual XTF test.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-07-29 15:06 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-29 12:07 [XTF PATCH] xtf-runner: fix two synchronisation issues Wei Liu
2016-07-29 12:43 ` Andrew Cooper
2016-07-29 12:58 ` Wei Liu
2016-07-29 13:06 ` Andrew Cooper
2016-07-29 13:12 ` Wei Liu
2016-07-29 13:23 ` Andrew Cooper
2016-07-29 13:26 ` Wei Liu
2016-07-29 14:31 ` Ian Jackson
2016-07-29 14:55 ` Wei Liu
2016-07-29 16:18 ` Ian Jackson
2016-07-29 16:35 ` Andrew Cooper
2016-07-29 16:41 ` Wei Liu
2016-07-29 15:05 ` Andrew Cooper [this message]
2016-08-01 13:16 ` [RFC PATCH 0/8] Fix console " Wei Liu
2016-08-01 13:16 ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
2016-08-05 15:40 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client Wei Liu
2016-08-05 15:43 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path Wei Liu
2016-08-05 15:44 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled Wei Liu
2016-08-05 15:48 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument Wei Liu
2016-08-05 15:49 ` Ian Jackson
2016-08-05 15:50 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 6/8] docs: document xenconsole startup protocol Wei Liu
2016-08-05 15:52 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 7/8] xl: use " Wei Liu
2016-08-05 15:55 ` Ian Jackson
2016-08-01 13:16 ` [RFC PATCH 8/8] tools/console: remove 5s bodge in console client Wei Liu
2016-08-05 15:57 ` Ian Jackson
2016-08-05 16:16 ` Wei Liu
2016-08-05 16:18 ` Ian Jackson
2016-08-05 16:28 ` Wei Liu
2016-08-05 16:32 ` Ian Jackson
2016-08-05 16:36 ` Wei Liu
2016-08-05 17:23 ` Wei Liu
2016-08-08 10:07 ` Ian Jackson
2016-08-01 14:04 ` [XTF PATCH] xtf-runner: use xl create -Fc directly Wei Liu
2016-07-29 13:27 ` [XTF PATCH] xtf-runner: fix two synchronisation issues Andrew Cooper
2016-07-29 14:21 ` Ian Jackson
2016-07-29 14:25 ` Wei Liu
2016-07-29 14:35 ` Ian Jackson
2016-07-29 14:46 ` Wei Liu
2016-07-29 14:26 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4f21dee1-7a7a-cf7c-70a8-f6cff1a9d65c@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).