From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xensource.com
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Wei Liu <wei.liu@citrix.com>, Jan Beulich <JBeulich@suse.com>
Subject: some thoughts about merlot{0|1} issues [was: Re: [xen-unstable test] 102522: tolerable FAIL - PUSHED]
Date: Thu, 24 Nov 2016 16:14:23 +0100 [thread overview]
Message-ID: <1480000463.2712.150.camel@citrix.com> (raw)
In-Reply-To: <osstest-102522-mainreport@xen.org>
[-- Attachment #1.1: Type: text/plain, Size: 5834 bytes --]
On Wed, 2016-11-23 at 15:54 +0000, osstest service owner wrote:
> flight 102522 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/102522/
>
> Regressions which are regarded as allowable (not blocking):
> test-amd64-amd64-xl-rtds 9 debian-
> install fail like 102465
>
This is on merlot1, and as far as I can tell, this test is failing
there since quite a while (is that the correct interpretation of this
table?):
http://logs.test-lab.xenproject.org/osstest/results/history/test-amd64-amd64-xl-rtds/xen-unstable
This is using RTDS as scheduler, but that should not be the problem. In
fact, what's failing is xen-create-image timing out.
Basically, it starts creating the VM filesystem via debootstrap, but
does not manage to finish doing that within the 2530 secs timeout:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/9.ts-debian-install.log
2016-11-23 13:12:35 Z command timed out [2500]: timeout 2530 ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=100 -o ServerAliveInterval=100 -o PasswordAuthentication=no -o ChallengeResponseAuthentication=no -o UserKnownHostsFile=tmp/t.known_hosts_102522.test-amd64-amd64-xl-rtds root@172.16.144.21 http_proxy=http://cache:3143/ \
xen-create-image \
In other runs on different hosts, still under RTDS, that takes about
650 seconds:
http://logs.test-lab.xenproject.org/osstest/logs/102532/test-amd64-amd64-xl-rtds/9.ts-debian-install.log
And I've tried myself on my test box, and it took 10m20s.
We know from here, that, this time, it got stuck rather early:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1---var-log-xen-tools-debian.guest.osstest.log
I've looked at a handful of other instances, and it seems to be
_always_ like that.
The system appears alive though, or at least right after the timeout,
the ts-log-capture phase --which includes issuing commands on the host
and copying files from there-- succeeds.
Also, not sure it means much, but xen-create-image starts at
12:30:55 and times out at 13:12:35. Looking in:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1---var-log-daemon.log
we see:
Nov 23 12:04:55 merlot1 ntpd[3003]: Listening on routing socket on fd #22 for interface updates
[..]
Nov 23 12:27:22 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
Nov 23 12:34:04 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
Nov 23 12:40:47 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
Nov 23 12:47:31 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
Nov 23 12:54:14 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
Nov 23 13:00:57 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
Nov 23 13:07:40 merlot1 init: Id "T0" respawning too fast: disabled for 5 minutes
So, again, at least something is alive on the host, and writing in the
logs, even during the time that debootstrap seems stuck.
There are 2 running vcpus:
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
Domain-0 0 1 2 r-- 35.7 all / all
Domain-0 0 17 9 r-- 38.3 all / all
vcpu 17 is running on CPU 9 which is on node 1, which has _0_ memory:
node: memsize memfree distances
0: 9216 7856 10,16,16,16
1: 0 0 16,10,16,16
2: 8175 7779 16,16,10,16
3: 0 0 16,16,16,10
(see here: http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/merlot1-output-xl_info_-n )
but that should not mean much, and in other, still failing, runs, this
does not happen (also, this is just what is going on while we collect
logs).
Now, about serial output:
http://logs.test-lab.xenproject.org/osstest/logs/102522/test-amd64-amd64-xl-rtds/serial-merlot1.log
When dumping ACPI C states, here's how things look like for _all_ CPUs:
Nov 23 13:13:00.382134 (XEN) ==cpu3==
Nov 23 13:13:00.382157 (XEN) active state: C-1
Nov 23 13:13:00.390096 (XEN) max_cstate: C7
Nov 23 13:13:00.390125 (XEN) states:
Nov 23 13:13:00.390148 (XEN) C1: type[C1] latency[000] usage[00000000] method[ HALT] duration[0]
Nov 23 13:13:00.398055 (XEN) C0: usage[00000000] duration[4229118701384]
Nov 23 13:13:00.398090 (XEN) PC2[0] PC3[0] PC6[0] PC7[0]
Nov 23 13:13:00.406088 (XEN) CC3[0] CC6[0] CC7[0]
And I checked other runs, and it's the same everywhere.
I remember that Jan suggested trying to pass max_cstate=1 to Xen at
boot. I was about to ask Ian to do that for this host, but it looks
like we're using only C0 and C1 already anyway.
Boot command line looks like this:
xen_commandline : placeholder conswitch=x watchdog com1=115200,8n1 console=com1,vga gdb=com1 dom0_mem=512M,max:512M ucode=scan sched=rtds
which makes the above look a bit weird to me... But I've played much
more with Intel boxes than with AMD ones, I admit.
For now, I'm done. At some point, I'll recall either merlot0 or merlot1
out of OSSTest, take it back for myself, and try to investigate more.
If, it in the meantime, any of this rings a bell for anyone, feel free
to speak up.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-11-24 15:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-23 15:54 [xen-unstable test] 102522: tolerable FAIL - PUSHED osstest service owner
2016-11-24 15:14 ` Dario Faggioli [this message]
2016-11-24 15:31 ` some thoughts about merlot{0|1} issues [was: Re: [xen-unstable test] 102522: tolerable FAIL - PUSHED] Jan Beulich
2016-11-28 13:48 ` Boris Ostrovsky
2016-11-28 15:16 ` Konrad Rzeszutek Wilk
2016-11-28 16:08 ` Andrew Cooper
2016-11-28 16:20 ` Jan Beulich
2016-11-28 16:45 ` Dario Faggioli
2016-11-28 17:06 ` Dario Faggioli
2016-11-28 17:19 ` Dario Faggioli
2016-11-28 18:27 ` Boris Ostrovsky
2016-11-29 11:06 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1480000463.2712.150.camel@citrix.com \
--to=dario.faggioli@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=JBeulich@suse.com \
--cc=boris.ostrovsky@oracle.com \
--cc=wei.liu@citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).