Re: [XTF PATCH] xtf-runner: fix two synchronisation issues

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ian Jackson <ian.jackson@eu.citrix.com>, Wei Liu <wei.liu2@citrix.com>
Cc: Xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [XTF PATCH] xtf-runner: fix two synchronisation issues
Date: Fri, 29 Jul 2016 16:05:15 +0100	[thread overview]
Message-ID: <4f21dee1-7a7a-cf7c-70a8-f6cff1a9d65c@citrix.com> (raw)
In-Reply-To: <22427.26818.981068.78463@mariner.uk.xensource.com>

On 29/07/16 15:31, Ian Jackson wrote:
> Wei Liu writes ("Re: [XTF PATCH] xtf-runner: fix two synchronisation issues"):
>> On Fri, Jul 29, 2016 at 01:43:42PM +0100, Andrew Cooper wrote:
>>> The runner existing before xl has torn down the guest is very
>>> deliberate, because some part of hvm guests is terribly slow to tear
>>> down; waiting synchronously for teardown tripled the wallclock time to
>>> run a load of tests back-to-back.
>> Then you won't know if a guest is leaked or it is being slowly destroyed
>> when a dead guest shows up in the snapshot of 'xl list'.
>>
>> Also consider that would make back-to-back tests that happen to have a
>> guest that has the same name as the one in previous test fail.
>>
>> I don't think getting blocked for a few more seconds is a big issue.
>> It's is important to eliminate such race conditions so that osstest can
>> work properly.
> IMO the biggest reason for waiting for teardown is that that will make
> it possible to accurately identify the xtf test which was responsible
> for the failure if a test reveals a bug which causes problems for the
> whole host.

That is perfectly reasonable.

>
> Suppose there is a test T1 which, in buggy hypervisors, creates an
> anomalous data structure, such that the hypervisor crashes when T1's
> guest is finally torn down.
>
> If we start to run the next test T2 immediately we see success output
> from T1, we will observe the host crashing "due to T2", and T1 would
> be regarded as having succeeded.
>
> This is why in an in-person conversation with Wei yesterday I
> recommended that osstest should after each xtf test (i) wait for
> everything to be torn down and (ii) then check that the dom0 is still
> up.  (And these two activities are regarded as part of the preceding
> test step.)

That is also my understanding of how the intended OSSTest integration is
going to work.

OSSTest asks `./xtf-runner --list` for all tests, then iterates over all
tests, running them one at a time, with suitable liveness checks
inbetween.  This is not using xtf-runner's ability to run multiple tests
back to back.


The dev usecase on the other hand is for something like, for checking a
test case refactoring or new bit of functionality.

$ ./xtf-runner sefltest
<snip>
Combined test results:
test-pv64-selftest                       SUCCESS
test-pv32pae-selftest                    SUCCESS
test-hvm64-selftest                      SUCCESS
test-hvm32pae-selftest                   SUCCESS
test-hvm32pse-selftest                   SUCCESS
test-hvm32-selftest                      SUCCESS


FWIW, I have just put a synchronous wait in to demonstrate.

Without wait:

$ time ./xtf-runner sefltest
<snip>

real    0m0.571s
user    0m0.060s
sys    0m0.228s

With wait:
$ time ./xtf-runner sefltest
<snip>

real    0m8.870s
user    0m0.048s
sys    0m0.280s


That is more than 8 wallclock seconds elapsed where nothing useful is
happening from the point of view of a human using ./xtf-runner.  All of
this time is spent between @releaseDomain and `xl create -F` finally
exiting.

>
> If this leads to over-consumption of machine resources because this
> serialisation is too slow then the right approach would be explicit
> parallelisation in osstest.  That would still mean that in the
> scenario above, T1 would be regarded as having failed, because T1
> wouldn't be regarded as having passed until osstest had seen that all
> of T1's cleanup had been done and the host was still up.  (T2 would
> _also_ be regarded as failed, and that might look like a heisenbug,
> but that would be tolerable.)

OSSTest shouldn't run multiple tests at once, and I have taken exactly
the same decision for XenRT.  Easy identification of what went bang is
the most important properly in these cases.

We are going to have to get to a vast test library before the wallclock
time of XTF tests approaches anything similar to installing a VM from
scratch.  I am not worried at the moment.

>
> Wei: I need to check what happens with multiple failing test steps in
> the same job.  Specifically, I need to check which one the bisector
> is likely to try to attack.

For individual XTF tests, it is entirely possible that every failure is
from a different change, so should be treated individually.

Having said that, it is also quite likely that, given a lot of similar
microckernels, one hypervisor bug would take a large number out at once,
and we really don't want to bisect each individual XTF test.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-07-29 15:06 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29 12:07 [XTF PATCH] xtf-runner: fix two synchronisation issues Wei Liu
2016-07-29 12:43 ` Andrew Cooper
2016-07-29 12:58   ` Wei Liu
2016-07-29 13:06     ` Andrew Cooper
2016-07-29 13:12       ` Wei Liu
2016-07-29 13:23         ` Andrew Cooper
2016-07-29 13:26           ` Wei Liu
2016-07-29 14:31     ` Ian Jackson
2016-07-29 14:55       ` Wei Liu
2016-07-29 16:18         ` Ian Jackson
2016-07-29 16:35           ` Andrew Cooper
2016-07-29 16:41           ` Wei Liu
2016-07-29 15:05       ` Andrew Cooper [this message]
2016-08-01 13:16       ` [RFC PATCH 0/8] Fix console " Wei Liu
2016-08-01 13:16         ` [RFC PATCH 1/8] tools/console: fix help string in client Wei Liu
2016-08-05 15:40           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 2/8] tools/console: introduce --start-notify-fd option for console client Wei Liu
2016-08-05 15:43           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 3/8] libxl: factor out libxl__console_tty_path Wei Liu
2016-08-05 15:44           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 4/8] libxl: wait up to 5s in libxl_console_exec for xenconsoled Wei Liu
2016-08-05 15:48           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 5/8] libxl: libxl_{primary_, }console_exec now take notify_fd argument Wei Liu
2016-08-05 15:49           ` Ian Jackson
2016-08-05 15:50             ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 6/8] docs: document xenconsole startup protocol Wei Liu
2016-08-05 15:52           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 7/8] xl: use " Wei Liu
2016-08-05 15:55           ` Ian Jackson
2016-08-01 13:16         ` [RFC PATCH 8/8] tools/console: remove 5s bodge in console client Wei Liu
2016-08-05 15:57           ` Ian Jackson
2016-08-05 16:16             ` Wei Liu
2016-08-05 16:18               ` Ian Jackson
2016-08-05 16:28                 ` Wei Liu
2016-08-05 16:32                   ` Ian Jackson
2016-08-05 16:36                     ` Wei Liu
2016-08-05 17:23                       ` Wei Liu
2016-08-08 10:07                         ` Ian Jackson
2016-08-01 14:04       ` [XTF PATCH] xtf-runner: use xl create -Fc directly Wei Liu
2016-07-29 13:27 ` [XTF PATCH] xtf-runner: fix two synchronisation issues Andrew Cooper
2016-07-29 14:21 ` Ian Jackson
2016-07-29 14:25   ` Wei Liu
2016-07-29 14:35     ` Ian Jackson
2016-07-29 14:46       ` Wei Liu
2016-07-29 14:26   ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f21dee1-7a7a-cf7c-70a8-f6cff1a9d65c@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).