xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
@ 2016-01-20 10:53 Ian Campbell
  2016-01-20 11:52 ` Ian Jackson
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2016-01-20 10:53 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

I've been running a couple of adhoc production tests per day on these since
before Xmas and they haven't lost sight of their disks again.

TLDR; I think we should throw them back in the pool.

With the recent timeout fixes they are working as well as the production
cubietruck-braque.

There are two flakey tests test-armhf-armhf-xl-rtds and test-armhf-armhf-
libvirt-raw, but those appear to be much better than before the timeout
changes and not specific to these three boards since the fourth one looks
to behave much the same.

At first glance it looks like some later test steps might just need a bit
more time on CT too.

FWIW comparison of production vs commission runs can be seen at

http://osstest.test-lab.xenproject.org/~ianc/commission/history/test-armhf-armhf-xl-rtds/xen-unstable.html
vs
http://logs.test-lab.xenproject.org/osstest/results/history/test-armhf-armhf-xl-rtds/xen-unstable.html

and

http://osstest.test-lab.xenproject.org/~ianc/commission/history/test-armhf-armhf-libvirt-raw/xen-unstable.html
vs
http://logs.test-lab.xenproject.org/osstest/results/history/test-armhf-armhf-libvirt-raw/xen-unstable.html

(nb: osstest e23886aa8fe7 is when the timeout fixes went live)

Unfortunately the second one has always run on arndale-*, which doesn't
give any data on cubietruck.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
  2016-01-20 10:53 OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use Ian Campbell
@ 2016-01-20 11:52 ` Ian Jackson
  2016-01-20 11:58   ` Ian Campbell
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Jackson @ 2016-01-20 11:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Ian Campbell writes ("OSSTEST: Re-blessing cubietruck-{picasso,gleizes,metzinger} for production use"):
> I've been running a couple of adhoc production tests per day on these since
> before Xmas and they haven't lost sight of their disks again.
> 
> TLDR; I think we should throw them back in the pool.

Great.

> With the recent timeout fixes they are working as well as the production
> cubietruck-braque.
> 
> There are two flakey tests test-armhf-armhf-xl-rtds and test-armhf-armhf-
> libvirt-raw, but those appear to be much better than before the timeout
> changes and not specific to these three boards since the fourth one looks
> to behave much the same.
> 
> At first glance it looks like some later test steps might just need a bit
> more time on CT too.

Maybe we should have target_adjust_timeout honour a host property to
multiply timeouts by some factor.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
  2016-01-20 11:52 ` Ian Jackson
@ 2016-01-20 11:58   ` Ian Campbell
  2016-01-20 12:28     ` Ian Campbell
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ian Campbell @ 2016-01-20 11:58 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Wed, 2016-01-20 at 11:52 +0000, Ian Jackson wrote:
> Ian Campbell writes ("OSSTEST: Re-blessing cubietruck-
> {picasso,gleizes,metzinger} for production use"):
> > I've been running a couple of adhoc production tests per day on these
> > since
> > before Xmas and they haven't lost sight of their disks again.
> > 
> > TLDR; I think we should throw them back in the pool.
> 
> Great.

I'll take this as a "yes, go ahead" ;-)

> > With the recent timeout fixes they are working as well as the
> > production
> > cubietruck-braque.
> > 
> > There are two flakey tests test-armhf-armhf-xl-rtds and test-armhf-
> > armhf-
> > libvirt-raw, but those appear to be much better than before the timeout
> > changes and not specific to these three boards since the fourth one
> > looks
> > to behave much the same.
> > 
> > At first glance it looks like some later test steps might just need a
> > bit
> > more time on CT too.
> 
> Maybe we should have target_adjust_timeout honour a host property to
> multiply timeouts by some factor.

That's not a bad idea, assuming the remaining issues really are timeouts of
this sort.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
  2016-01-20 11:58   ` Ian Campbell
@ 2016-01-20 12:28     ` Ian Campbell
  2016-01-20 16:16     ` Ian Campbell
  2016-01-20 16:56     ` Ian Campbell
  2 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2016-01-20 12:28 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Wed, 2016-01-20 at 11:58 +0000, Ian Campbell wrote:
> On Wed, 2016-01-20 at 11:52 +0000, Ian Jackson wrote:
> > Ian Campbell writes ("OSSTEST: Re-blessing cubietruck-
> > {picasso,gleizes,metzinger} for production use"):
> > > I've been running a couple of adhoc production tests per day on these
> > > since
> > > before Xmas and they haven't lost sight of their disks again.
> > > 
> > > TLDR; I think we should throw them back in the pool.
> > 
> > Great.
> 
> I'll take this as a "yes, go ahead" ;-)

and now I have actually gone and done it...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
  2016-01-20 11:58   ` Ian Campbell
  2016-01-20 12:28     ` Ian Campbell
@ 2016-01-20 16:16     ` Ian Campbell
  2016-01-20 16:56     ` Ian Campbell
  2 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2016-01-20 16:16 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Wed, 2016-01-20 at 11:58 +0000, Ian Campbell wrote:
> 
> > > With the recent timeout fixes they are working as well as the
> > > production
> > > cubietruck-braque.
> > > 
> > > There are two flakey tests test-armhf-armhf-xl-rtds and test-armhf-
> > > armhf-libvirt-raw, but those appear to be much better than before the timeout
> > > changes and not specific to these three boards since the fourth one
> > > looks to behave much the same.
> > > 
> > > At first glance it looks like some later test steps might just need a
> > > bit more time on CT too.
> > 
> > Maybe we should have target_adjust_timeout honour a host property to
> > multiply timeouts by some factor.
> 
> That's not a bad idea, assuming the remaining issues really are timeouts
> of this sort.

The test-armhf-armhf-libvirt-raw failure in commission flights 78425 and
78506[0] are:

    !! ERROR: Installation step failed

    An installation step failed. You can try to run the failing item again from the
    menu, or skip it and choose something else. The failing step is: Select and 
    install software

With no way to know why or what happened :-(.

The failures from 78386 backwards in
 http://osstest.test-lab.xenproject.org/~ianc/commission/history/test-armhf-armhf-libvirt-raw/xen-unstable.html
are the dd timeout, which is fixed.

Suggest we just keep an eye on the failure rate of this one.

Ian.

[0] http://osstest.test-lab.xenproject.org/~osstest/pub/logs/78506/test-armhf-armhf-libvirt-raw/cubietruck-picasso---var-log-xen-console-guest-debian.jessie.guest.osstest.log


> 
> Ian.
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
  2016-01-20 11:58   ` Ian Campbell
  2016-01-20 12:28     ` Ian Campbell
  2016-01-20 16:16     ` Ian Campbell
@ 2016-01-20 16:56     ` Ian Campbell
  2016-01-20 17:13       ` Ian Jackson
  2 siblings, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2016-01-20 16:56 UTC (permalink / raw)
  To: Ian Jackson; +Cc: xen-devel

On Wed, 2016-01-20 at 11:58 +0000, Ian Campbell wrote:
> > > With the recent timeout fixes they are working as well as the
> > > production
> > > cubietruck-braque.
> > > 
> > > There are two flakey tests test-armhf-armhf-xl-rtds and test-armhf-
> > > armhf-
> > > libvirt-raw, but those appear to be much better than before the timeout
> > > changes and not specific to these three boards since the fourth one
> > > looks
> > > to behave much the same.
> > > 
> > > At first glance it looks like some later test steps might just need a
> > > bit
> > > more time on CT too.
> > 
> > Maybe we should have target_adjust_timeout honour a host property to
> > multiply timeouts by some factor.
> 
> That's not a bad idea, assuming the remaining issues really are timeouts of
> this sort.

The test-armhf-armhf-xl-rtds case is successfully booting, but just a
little too slow to bring up networking, it's hitting.

    2016-01-18 19:40:10 Z executing ssh ...     root@172.16.144.49     xl list 
    2016-01-18 19:40:11 Z guest debian.guest.osstest state is r 
    2016-01-18 19:40:11 Z guest debian.guest.osstest 5a:36:0e:59:00:0b 22 link/ip/tcp: waiting 40s... 
    2016-01-18 19:40:11 Z guest debian.guest.osstest 5a:36:0e:59:00:0b 22 link/ip/tcp: no active lease (waiting) ... 
    ...
    2016-01-18 19:40:56 Z FAILURE: guest debian.guest.osstest 5a:36:0e:59:00:0b 22 link/ip/tcp: wait timed out: no active lease. 
    failure: guest debian.guest.osstest 5a:36:0e:59:00:0b 22 link/ip/tcp: wait timed out: no active lease.
    + rc=255

http://osstest.test-lab.xenproject.org/~osstest/pub/logs/78425/test-armhf-armhf-xl-rtds/11.ts-guest-start.log

78506 passed and has:

    2016-01-19 17:56:11 Z guest debian.guest.osstest state is r 
    2016-01-19 17:56:11 Z guest debian.guest.osstest 5a:36:0e:aa:00:0b 22 link/ip/tcp: waiting 40s... 
    2016-01-19 17:56:11 Z guest debian.guest.osstest 5a:36:0e:aa:00:0b 22 link/ip/tcp: no active lease (waiting) ... 
    2016-01-19 17:56:32 Z guest debian.guest.osstest 5a:36:0e:aa:00:0b 22 link/ip/tcp: nc: 256 (UNKNOWN) [172.16.145.103] 22 (ssh) : Connection refused |  (waiting) ... 
    2016-01-19 17:56:45 Z guest debian.guest.osstest 5a:36:0e:aa:00:0b 22 link/ip/tcp: ok. (34s) 

http://osstest.test-lab.xenproject.org/~osstest/pub/logs/78506/test-armhf-armhf-xl-rtds/11.ts-guest-start.log

i.e. it took 34/40s, so a bit border line.

In the production env
http://logs.test-lab.xenproject.org/osstest/logs/78525/test-armhf-armhf-xl-rtds/11.ts-guest-start.log
has it taking 27s, in 78443 it was 41s (flying close to the edge there!),
78395 has 45s (flipping the edge the bird as it disappears into the
distance ;-) )

The guest console log shows:

    A start job is running for LSB: Raise network interf...34s / no limit)

http://osstest.test-lab.xenproject.org/~osstest/pub/logs/78425/test-armhf-armhf-xl-rtds/cubietruck-picasso---var-log-xen-console-guest-debian.guest.osstest.log

(it's messy in there, I thought I'd arranged for sane logging in guest,via
sysvinit and FANCYTTY=0, clearly not quite).

So bringing up the network does appear to be rather slow.

The host console has:

Jan 18 19:39:57.747858 [ 2354.661150] device vif1.0 entered promiscuous mode
Jan 18 19:40:10.795484 [ 2354.670132] IPv6: ADDRCONF(NETDEV_UP): vif1.0: link is not ready
Jan 18 19:40:10.805719 [ 2358.350734] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
Jan 18 19:40:14.488585 [ 2358.522313] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
Jan 18 19:40:14.660189 [ 2358.763589] IPv6: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready
Jan 18 19:40:14.899590 [ 2358.763859] xenbr0: port 2(vif1.0) entered forwarding state
Jan 18 19:40:14.905182 [ 2358.763933] xenbr0: port 2(vif1.0) entered forwarding state
Jan 18 19:40:14.910801 (XEN) mm.c:1259:d0v1 gnttab_mark_dirty not implemented yet

http://osstest.test-lab.xenproject.org/~osstest/pub/logs/78425/test-armhf-armhf-xl-rtds/serial-cubietruck-picasso.log

So it does appear to be taking nearly 20 to become forwarding.

In some other host logs I saw things like: 

Jan 18 11:07:43.692168 [ 2352.190458] device vif1.0 entered promiscuous mode
Jan 18 11:08:00.541965 [ 2352.200226] IPv6: ADDRCONF(NETDEV_UP): vif1.0: link is not ready
Jan 18 11:08:00.552908 [ 2355.872175] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
Jan 18 11:08:04.227269 [ 2355.990407] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
Jan 18 11:08:04.345476 [ 2356.173545] IPv6: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready
Jan 18 11:08:04.526627 [ 2356.173844] xenbr0: port 2(vif1.0) entered forwarding state
Jan 18 11:08:04.532224 [ 2356.173903] xenbr0: port 2(vif1.0) entered forwarding state
Jan 18 11:08:04.537973 (XEN) mm.c:1259:d0v0 gnttab_mark_dirty not implemented yet
Jan 18 11:08:04.548507 [ 2387.787532] vif vif-1-0 vif1.0: draining TX queue

http://logs.test-lab.xenproject.org/osstest/logs/78395/test-armhf-armhf-xl-rtds/serial-cubietruck-braque.log

Which was a similar delay, but with the extra "vif vif-1-0 vif1.0: draining
TX queue". I'm not sure but I think that might indicate a delay or a
recoverable issue passing traffic, which might be explainable by
"cubietruck's appear to be really slow in real life" or might equally well
be a real issue.

I'll add this to my list to investigate further, but I don't think we want
to tweak the t/o just yet.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use
  2016-01-20 16:56     ` Ian Campbell
@ 2016-01-20 17:13       ` Ian Jackson
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Jackson @ 2016-01-20 17:13 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Ian Campbell writes ("Re: [Xen-devel] OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use"):
> The host console has:
> 
> Jan 18 19:39:57.747858 [ 2354.661150] device vif1.0 entered promiscuous mode
> Jan 18 19:40:10.795484 [ 2354.670132] IPv6: ADDRCONF(NETDEV_UP): vif1.0: link is not ready
> Jan 18 19:40:10.805719 [ 2358.350734] xen-blkback:ring-ref 8, event-channel 3, protocol 1 (arm-abi) persistent grants
> Jan 18 19:40:14.488585 [ 2358.522313] xen-blkback:ring-ref 9, event-channel 4, protocol 1 (arm-abi) persistent grants
> Jan 18 19:40:14.660189 [ 2358.763589] IPv6: ADDRCONF(NETDEV_CHANGE): vif1.0: link becomes ready
> Jan 18 19:40:14.899590 [ 2358.763859] xenbr0: port 2(vif1.0) entered forwarding state
> Jan 18 19:40:14.905182 [ 2358.763933] xenbr0: port 2(vif1.0) entered forwarding state
> Jan 18 19:40:14.910801 (XEN) mm.c:1259:d0v1 gnttab_mark_dirty not implemented yet
> 
> http://osstest.test-lab.xenproject.org/~osstest/pub/logs/78425/test-armhf-armhf-xl-rtds/serial-cubietruck-picasso.log
> 
> So it does appear to be taking nearly 20 to become forwarding.

I don't see where you get that 20s from.  2358-2354 = 4.  Oh wait,
you're looking at the serial log timestamp.  But they disagree with
the kernel's own timestamp.

To me that suggests that the kernel is funted somehow.  The messages
are being generated internally at one time but only actually sent over
the serial port much later.

> I'll add this to my list to investigate further, but I don't think we want
> to tweak the t/o just yet.

Indeed.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-20 17:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-20 10:53 OSSTEST: Re-blessing cubietruck-{picasso, gleizes, metzinger} for production use Ian Campbell
2016-01-20 11:52 ` Ian Jackson
2016-01-20 11:58   ` Ian Campbell
2016-01-20 12:28     ` Ian Campbell
2016-01-20 16:16     ` Ian Campbell
2016-01-20 16:56     ` Ian Campbell
2016-01-20 17:13       ` Ian Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).