* [xen-unstable test] 18851: regressions - FAIL
@ 2013-08-29 19:18 xen.org
2013-08-30 10:36 ` Jan Beulich
2013-09-02 15:10 ` Ian Jackson
0 siblings, 2 replies; 19+ messages in thread
From: xen.org @ 2013-08-29 19:18 UTC (permalink / raw)
To: xen-devel; +Cc: ian.jackson
flight 18851 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778
test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778
Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass
test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass
test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass
test-amd64-amd64-xl-qemut-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-qemut-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-win7-amd64 13 guest-stop fail never pass
version targeted for testing:
xen fb3f1c1855bd9aca625bc0d040be4cdcc216e958
baseline version:
xen 8a7769b4453168e23e8935a85e9a875ef5117253
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Ian Campbell <ijc@hellion.org.uk>
Ian Jackson <ian.jackson@eu.citrix.com>
Jaeyong Yoo <jaeyong.yoo@samsung.com>
Jan Beulich <jbeulich@suse.com>
Julien Grall <julien.grall@linaro.org>
Keir Fraser <keir@xen.org>
Matt Wilson <msw@amazon.com>
Sander Eikelenboom <linux@eikelenboom.it>
Suravee Suthikulpanit <suravee.suthikulapanit@amd.com>
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
------------------------------------------------------------
jobs:
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-amd64-i386-xl pass
test-amd64-i386-rhel6hvm-amd fail
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 pass
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel pass
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-i386-xl-multivcpu fail
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv fail
test-amd64-amd64-xl-sedf pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-i386-xend-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemuu-winxpsp3 fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 333 lines long.)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-08-29 19:18 [xen-unstable test] 18851: regressions - FAIL xen.org
@ 2013-08-30 10:36 ` Jan Beulich
2013-09-02 15:10 ` Ian Jackson
1 sibling, 0 replies; 19+ messages in thread
From: Jan Beulich @ 2013-08-30 10:36 UTC (permalink / raw)
To: ian.jackson, xen-devel
>>> On 29.08.13 at 21:18, xen.org <ian.jackson@eu.citrix.com> wrote:
> flight 18851 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
> test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778
> test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778
So these all appear to be timeouts of infrastructure operations
that don't have an immediate explanation to me. The only odd
thing is
[ 12.719551] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 12.726458] IPv6: ADDRCONF(NETDEV_UP): xenbr0: link is not ready
in each of the respective woodlouse---var-log-dmesg files. Is
woodlouse suffering from a network connectivity issue, perhaps
as a result of the kernel update? In any event, throughout the
last runs it has - afaics - always been woodlouse that had failures
(and the stickiness of failed tests then likely prevents them to
ever get a success elsewhere). So perhaps worth trying to take
woodlouse out of the pool temporarily?
Jan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-08-29 19:18 [xen-unstable test] 18851: regressions - FAIL xen.org
2013-08-30 10:36 ` Jan Beulich
@ 2013-09-02 15:10 ` Ian Jackson
2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org
` (2 more replies)
1 sibling, 3 replies; 19+ messages in thread
From: Ian Jackson @ 2013-09-02 15:10 UTC (permalink / raw)
To: xen-devel; +Cc: Boris Ostrovsky, Keir Fraser, David Vrabel, Jan Beulich
xen.org writes ("[xen-unstable test] 18851: regressions - FAIL"):
> flight 18851 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
I have had a bisection report about this:
From: "xen.org" <osstest@woking.cam.xci-test.com>
From: "xen.org" <ian.jackson@eu.citrix.com>
X-rewrote-sender: osstest@woking.cam.xci-test.com
Date: Mon, 02 Sep 2013 14:33:30 +0100
branch xen-unstable
xen branch xen-unstable
job test-amd64-i386-qemut-rhel6hvm-amd
test redhat-install
Tree: linux git://xenbits.xen.org/linux-pvops.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git
Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git
Tree: xen git://xenbits.xen.org/xen.git
*** Found and reproduced problem changeset ***
Bug is in tree: linux git://xenbits.xen.org/linux-pvops.git
Bug introduced: 8bf3379a74bc9132751bfa685bad2da318fd59d7
Bug not present: a938a246d34912423c560f475ccf1ce0c71d9d00
commit 8bf3379a74bc9132751bfa685bad2da318fd59d7
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Thu Aug 29 09:47:51 2013 -0700
Linux 3.10.10
[etc.]
The head commit there is a merge. The email contained all the log
messages in between those two, so bounced. (The bisector didn't
examine the other parent of the merge, I think because it wasn't an
ancestor of the baseline "good" revision.)
I'm not sure why my osstest push gate didn't catch this, but the
regression is indeed caused by the change from Jeremy's old tree to
Linux 3.10.y.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [xen-unstable test] 19006: regressions - trouble: broken/fail/pass
@ 2013-09-02 17:02 ` xen.org
0 siblings, 0 replies; 19+ messages in thread
From: xen.org @ 2013-09-02 17:02 UTC (permalink / raw)
To: xen-devel; +Cc: ian.jackson
flight 19006 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/19006/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778
test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778
test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 18778
test-amd64-i386-qemuu-rhel6hvm-amd 7 redhat-install fail in 18998 REGR. vs. 18778
Tests which are failing intermittently (not blocking):
test-amd64-i386-qemuu-rhel6hvm-amd 6 leak-check/basis(6) fail pass in 18998
test-amd64-i386-rhel6hvm-amd 6 leak-check/basis(6) fail pass in 18998
test-amd64-i386-pv 6 leak-check/basis(6) fail in 18998 pass in 19006
Regressions which are regarded as allowable (not blocking):
test-amd64-i386-rhel6hvm-amd 7 redhat-install fail in 18998 like 18920-bisect
Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass
test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass
test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass
test-amd64-amd64-xl-qemut-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-qemut-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-win7-amd64 13 guest-stop fail never pass
version targeted for testing:
xen ec3f60c9d609703cce2fca30edbc6e72cd18e492
baseline version:
xen 8a7769b4453168e23e8935a85e9a875ef5117253
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Ian Campbell <ijc@hellion.org.uk>
Ian Jackson <ian.jackson@eu.citrix.com>
Jaeyong Yoo <jaeyong.yoo@samsung.com>
Jan Beulich <jbeulich@suse.com>
Julien Grall <julien.grall@linaro.org>
Keir Fraser <keir@xen.org>
Len Brown <len.brown@intel.com>
Matt Wilson <msw@amazon.com>
Sander Eikelenboom <linux@eikelenboom.it>
Suravee Suthikulpanit <suravee.suthikulapanit@amd.com>
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tim Deegan <tim@xen.org>
Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
------------------------------------------------------------
jobs:
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-amd64-i386-xl pass
test-amd64-i386-rhel6hvm-amd broken
test-amd64-i386-qemut-rhel6hvm-amd fail
test-amd64-i386-qemuu-rhel6hvm-amd broken
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 pass
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel pass
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-i386-xl-multivcpu fail
test-amd64-amd64-pair pass
test-amd64-i386-pair fail
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv fail
test-amd64-amd64-xl-sedf pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-i386-xend-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemuu-winxpsp3 fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 522 lines long.)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-09-02 15:10 ` Ian Jackson
2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org
@ 2013-09-02 17:09 ` Ian Jackson
2013-09-02 17:15 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass [and 1 more messages] Ian Jackson
2013-09-04 9:04 ` [xen-unstable test] 18851: regressions - FAIL Jan Beulich
2 siblings, 1 reply; 19+ messages in thread
From: Ian Jackson @ 2013-09-02 17:09 UTC (permalink / raw)
To: xen-devel, David Vrabel, Jan Beulich, Boris Ostrovsky,
Konrad Rzeszutek Wilk, Keir Fraser
Ian Jackson writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
> xen.org writes ("[xen-unstable test] 18851: regressions - FAIL"):
> > flight 18851 xen-unstable real [real]
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/
> >
> > Regressions :-(
> >
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> > test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
xen.org writes ("[xen-unstable test] 19006: regressions - trouble: broken/fail/pass"):
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778
I looked at this one from 19006. The system is running under Xen but
has no guests. It shows a wget process running. I can't easily tell
whether it has hung, but there are no other signs of trouble in the
logs.
The tester was able to ssh in and get process listings and so forth so
it must be that (a) just the debootstrap stuff has hung (b) trying to
ssh in to collect logs unwedged it (c) the problem is actually poor
performance, not a hang.
The system allows 2ks for a debootstrap, which should be ample (given
that there's a local mirror).
So I think it's probably a performance regression. I will try to
repro this tomorrow.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 19006: regressions - trouble: broken/fail/pass [and 1 more messages]
2013-09-02 17:09 ` [xen-unstable test] 18851: regressions - FAIL Ian Jackson
@ 2013-09-02 17:15 ` Ian Jackson
0 siblings, 0 replies; 19+ messages in thread
From: Ian Jackson @ 2013-09-02 17:15 UTC (permalink / raw)
To: xen.org; +Cc: xen-devel, Keir Fraser, David Vrabel, Jan Beulich,
Boris Ostrovsky
xen.org writes ("[xen-unstable test] 19006: regressions - trouble: broken/fail/pass"):
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
> test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778
> test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778
All these three were on woodlouse.
> test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 18778
This was on gall-mite and itch-mite.
> test-amd64-i386-qemuu-rhel6hvm-amd 7 redhat-install fail in 18998 REGR. vs. 18778
This was on woodlouse too (but the logs have been expired).
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-09-02 15:10 ` Ian Jackson
2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org
2013-09-02 17:09 ` [xen-unstable test] 18851: regressions - FAIL Ian Jackson
@ 2013-09-04 9:04 ` Jan Beulich
2013-09-04 10:41 ` Ian Jackson
2 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2013-09-04 9:04 UTC (permalink / raw)
To: David Vrabel, Ian Jackson, Boris Ostrovsky, Konrad Rzeszutek Wilk
Cc: xen-devel, Keir Fraser
>>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> *** Found and reproduced problem changeset ***
>
> Bug is in tree: linux git://xenbits.xen.org/linux-pvops.git
> Bug introduced: 8bf3379a74bc9132751bfa685bad2da318fd59d7
> Bug not present: a938a246d34912423c560f475ccf1ce0c71d9d00
>
>
> commit 8bf3379a74bc9132751bfa685bad2da318fd59d7
> Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Date: Thu Aug 29 09:47:51 2013 -0700
>
> Linux 3.10.10
>
> [etc.]
>
> The head commit there is a merge. The email contained all the log
> messages in between those two, so bounced. (The bisector didn't
> examine the other parent of the merge, I think because it wasn't an
> ancestor of the baseline "good" revision.)
>
> I'm not sure why my osstest push gate didn't catch this, but the
> regression is indeed caused by the change from Jeremy's old tree to
> Linux 3.10.y.
So how do we want to deal with that? Linux maintainers - any
chance you could help out? The staging tree having been stuck
for over a week is certainly less than ideal...
Jan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-09-04 9:04 ` [xen-unstable test] 18851: regressions - FAIL Jan Beulich
@ 2013-09-04 10:41 ` Ian Jackson
2013-09-05 11:24 ` David Vrabel
0 siblings, 1 reply; 19+ messages in thread
From: Ian Jackson @ 2013-09-04 10:41 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel, Boris Ostrovsky, Keir Fraser, David Vrabel
Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
...
> > I'm not sure why my osstest push gate didn't catch this, but the
> > regression is indeed caused by the change from Jeremy's old tree to
> > Linux 3.10.y.
It appears that the push gate didn't catch it because it's host
specific, and it got lucky and didn't run a test on that host.
> So how do we want to deal with that? Linux maintainers - any
> chance you could help out? The staging tree having been stuck
> for over a week is certainly less than ideal...
David Vrabel pointed out that more modern kernels have a different
interpretation of things like "dom0_mem=256M", and can waste lots and
lots of actual memory on pointless bookkeeping for future expansion
(which the kernel envisages but we do not).
I have changed it to "dom0_mem=256M,max:256M". I got a push of this
change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the
test runs yet reported have used this change.
...
I have just checked the database and flights 19046 onwards are using
this new command-line option. None of them have reported yet. In
fact due to the backlog the system is rather clogged with runs using
the old osstest. I'm going to manually kill those.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-09-04 10:41 ` Ian Jackson
@ 2013-09-05 11:24 ` David Vrabel
2013-09-05 12:20 ` Jan Beulich
0 siblings, 1 reply; 19+ messages in thread
From: David Vrabel @ 2013-09-05 11:24 UTC (permalink / raw)
To: Ian Jackson; +Cc: xen-devel, Boris Ostrovsky, Keir Fraser, Jan Beulich
On 04/09/13 11:41, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> ...
>>> I'm not sure why my osstest push gate didn't catch this, but the
>>> regression is indeed caused by the change from Jeremy's old tree to
>>> Linux 3.10.y.
>
> It appears that the push gate didn't catch it because it's host
> specific, and it got lucky and didn't run a test on that host.
>
>> So how do we want to deal with that? Linux maintainers - any
>> chance you could help out? The staging tree having been stuck
>> for over a week is certainly less than ideal...
>
> David Vrabel pointed out that more modern kernels have a different
> interpretation of things like "dom0_mem=256M", and can waste lots and
> lots of actual memory on pointless bookkeeping for future expansion
> (which the kernel envisages but we do not).
>
> I have changed it to "dom0_mem=256M,max:256M". I got a push of this
> change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the
> test runs yet reported have used this change.
Woodlouse's e820 as seen by the kernel looks like:
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable
[ 0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved
[ 0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable
[ 0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9
[ 0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data
[ 0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS
[ 0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved
[ 0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved
[ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved
[ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
[ 0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved
[ 0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable
[ 0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable
[ 0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
That last reserved entry I think confuses the early setup and it does
odd things like:
[ 0.000000] Set 266338518 page(s) to 1-1 mapping
Possibly relevant kernel thread here:
http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html
I note that the e820 as seen by Xen does not have this reserved region
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000009a800 (usable)
(XEN) 000000000009a800 - 00000000000a0000 (reserved)
(XEN) 00000000000e6000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000d7f90000 (usable)
(XEN) 00000000d7f9e000 - 00000000d7fa0000 type 9
(XEN) 00000000d7fa0000 - 00000000d7fae000 (ACPI data)
(XEN) 00000000d7fae000 - 00000000d7fe0000 (ACPI NVS)
(XEN) 00000000d7fe0000 - 00000000d7fee000 (reserved)
(XEN) 00000000d7ff0000 - 00000000d8000000 (reserved)
(XEN) 00000000e0000000 - 00000000f0000000 (reserved)
(XEN) 00000000fec00000 - 00000000fec03000 (reserved)
(XEN) 00000000fee00000 - 00000000fee01000 (reserved)
(XEN) 00000000ff700000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000228000000 (usable)
So it must be being added by Xen?
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-09-05 11:24 ` David Vrabel
@ 2013-09-05 12:20 ` Jan Beulich
2013-09-05 14:09 ` David Vrabel
0 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2013-09-05 12:20 UTC (permalink / raw)
To: David Vrabel, Ian Jackson; +Cc: xen-devel, Boris Ostrovsky, Keir Fraser
>>> On 05.09.13 at 13:24, David Vrabel <david.vrabel@citrix.com> wrote:
> On 04/09/13 11:41, Ian Jackson wrote:
>> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
>>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
>> ...
>>>> I'm not sure why my osstest push gate didn't catch this, but the
>>>> regression is indeed caused by the change from Jeremy's old tree to
>>>> Linux 3.10.y.
>>
>> It appears that the push gate didn't catch it because it's host
>> specific, and it got lucky and didn't run a test on that host.
>>
>>> So how do we want to deal with that? Linux maintainers - any
>>> chance you could help out? The staging tree having been stuck
>>> for over a week is certainly less than ideal...
>>
>> David Vrabel pointed out that more modern kernels have a different
>> interpretation of things like "dom0_mem=256M", and can waste lots and
>> lots of actual memory on pointless bookkeeping for future expansion
>> (which the kernel envisages but we do not).
>>
>> I have changed it to "dom0_mem=256M,max:256M". I got a push of this
>> change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the
>> test runs yet reported have used this change.
>
> Woodlouse's e820 as seen by the kernel looks like:
>
> [ 0.000000] e820: BIOS-provided physical RAM map:
> [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable
> [ 0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved
> [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable
> [ 0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9
> [ 0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data
> [ 0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS
> [ 0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved
> [ 0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved
> [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved
> [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved
> [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
> [ 0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved
> [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable
> [ 0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable
> [ 0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
>
> That last reserved entry I think confuses the early setup and it does
> odd things like:
>
> [ 0.000000] Set 266338518 page(s) to 1-1 mapping
>
> Possibly relevant kernel thread here:
>
> http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html
>
> I note that the e820 as seen by Xen does not have this reserved region
>
> (XEN) Xen-e820 RAM map:
> (XEN) 0000000000000000 - 000000000009a800 (usable)
> (XEN) 000000000009a800 - 00000000000a0000 (reserved)
> (XEN) 00000000000e6000 - 0000000000100000 (reserved)
> (XEN) 0000000000100000 - 00000000d7f90000 (usable)
> (XEN) 00000000d7f9e000 - 00000000d7fa0000 type 9
> (XEN) 00000000d7fa0000 - 00000000d7fae000 (ACPI data)
> (XEN) 00000000d7fae000 - 00000000d7fe0000 (ACPI NVS)
> (XEN) 00000000d7fe0000 - 00000000d7fee000 (reserved)
> (XEN) 00000000d7ff0000 - 00000000d8000000 (reserved)
> (XEN) 00000000e0000000 - 00000000f0000000 (reserved)
> (XEN) 00000000fec00000 - 00000000fec03000 (reserved)
> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
> (XEN) 00000000ff700000 - 0000000100000000 (reserved)
> (XEN) 0000000100000000 - 0000000228000000 (usable)
>
> So it must be being added by Xen?
Yes - see d838ac25 ("x86: don't allow Dom0 access to the HT
address range"). But that's the case on all AMD systems, and
I thought it wasn't just woodlouse that's an AMD one - Ian?
In any event - how can the kernel side code make _any_
assumptions on what is or is not in the E820 table? I've
recently seen logs from a system where reserved (MMIO)
blocks appear right below the 1Tb (or maybe it was even 16Tb)
boundary, without Xen inserting them.
I would certainly be willing to revert that patch for the time
being if we have reasons to believe this helps, but only as long
as it is clear that the kernel needs fixing, and that I'll want this
back before 4.4 goes out. Do we have baseline (8a7769b4)
test results including the new kernel, with part of it run on
woodlouse?
Jan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL
2013-09-05 12:20 ` Jan Beulich
@ 2013-09-05 14:09 ` David Vrabel
2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson
0 siblings, 1 reply; 19+ messages in thread
From: David Vrabel @ 2013-09-05 14:09 UTC (permalink / raw)
To: Jan Beulich
Cc: Keir Fraser, Ian Jackson, xen-devel, Boris Ostrovsky,
Malcolm Crossley
On 05/09/13 13:20, Jan Beulich wrote:
>>>> On 05.09.13 at 13:24, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 04/09/13 11:41, Ian Jackson wrote:
>>> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
>>>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
>>> ...
>>>>> I'm not sure why my osstest push gate didn't catch this, but the
>>>>> regression is indeed caused by the change from Jeremy's old tree to
>>>>> Linux 3.10.y.
>>>
>>> It appears that the push gate didn't catch it because it's host
>>> specific, and it got lucky and didn't run a test on that host.
>>>
>>>> So how do we want to deal with that? Linux maintainers - any
>>>> chance you could help out? The staging tree having been stuck
>>>> for over a week is certainly less than ideal...
>>>
>>> David Vrabel pointed out that more modern kernels have a different
>>> interpretation of things like "dom0_mem=256M", and can waste lots and
>>> lots of actual memory on pointless bookkeeping for future expansion
>>> (which the kernel envisages but we do not).
>>>
>>> I have changed it to "dom0_mem=256M,max:256M". I got a push of this
>>> change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the
>>> test runs yet reported have used this change.
>>
>> Woodlouse's e820 as seen by the kernel looks like:
>>
>> [ 0.000000] e820: BIOS-provided physical RAM map:
>> [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable
>> [ 0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved
>> [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable
>> [ 0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9
>> [ 0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data
>> [ 0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS
>> [ 0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved
>> [ 0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved
>> [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved
>> [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved
>> [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved
>> [ 0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved
>> [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable
>> [ 0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable
>> [ 0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved
>>
>> That last reserved entry I think confuses the early setup and it does
>> odd things like:
>>
>> [ 0.000000] Set 266338518 page(s) to 1-1 mapping
>>
>> Possibly relevant kernel thread here:
>>
>> http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html
>>
>> I note that the e820 as seen by Xen does not have this reserved region
>>
>> (XEN) Xen-e820 RAM map:
>> (XEN) 0000000000000000 - 000000000009a800 (usable)
>> (XEN) 000000000009a800 - 00000000000a0000 (reserved)
>> (XEN) 00000000000e6000 - 0000000000100000 (reserved)
>> (XEN) 0000000000100000 - 00000000d7f90000 (usable)
>> (XEN) 00000000d7f9e000 - 00000000d7fa0000 type 9
>> (XEN) 00000000d7fa0000 - 00000000d7fae000 (ACPI data)
>> (XEN) 00000000d7fae000 - 00000000d7fe0000 (ACPI NVS)
>> (XEN) 00000000d7fe0000 - 00000000d7fee000 (reserved)
>> (XEN) 00000000d7ff0000 - 00000000d8000000 (reserved)
>> (XEN) 00000000e0000000 - 00000000f0000000 (reserved)
>> (XEN) 00000000fec00000 - 00000000fec03000 (reserved)
>> (XEN) 00000000fee00000 - 00000000fee01000 (reserved)
>> (XEN) 00000000ff700000 - 0000000100000000 (reserved)
>> (XEN) 0000000100000000 - 0000000228000000 (usable)
>>
>> So it must be being added by Xen?
>
> Yes - see d838ac25 ("x86: don't allow Dom0 access to the HT
> address range"). But that's the case on all AMD systems, and
> I thought it wasn't just woodlouse that's an AMD one - Ian?
>
> In any event - how can the kernel side code make _any_
> assumptions on what is or is not in the E820 table? I've
> recently seen logs from a system where reserved (MMIO)
> blocks appear right below the 1Tb (or maybe it was even 16Tb)
> boundary, without Xen inserting them.
>
> I would certainly be willing to revert that patch for the time
> being if we have reasons to believe this helps, but only as long
> as it is clear that the kernel needs fixing, and that I'll want this
> back before 4.4 goes out. Do we have baseline (8a7769b4)
> test results including the new kernel, with part of it run on
> woodlouse?
This looks like a red herring. Having poked about in woodlouse it looks
like something is screwy with interrupts. The tg3 cards aren't using
MSI and the USB controller is using edge not level handlers. Another
machine with the same chipset is happily using MSIs.
Malcolm (Cc) has some suggestions for things to try.
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-05 14:09 ` David Vrabel
@ 2013-09-06 10:38 ` Ian Jackson
2013-09-06 10:49 ` Jan Beulich
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Ian Jackson @ 2013-09-06 10:38 UTC (permalink / raw)
To: David Vrabel, Jan Beulich
Cc: xen-devel, Boris Ostrovsky, Keir Fraser, Malcolm Crossley
Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
> This looks like a red herring. Having poked about in woodlouse it looks
> like something is screwy with interrupts. The tg3 cards aren't using
> MSI and the USB controller is using edge not level handlers. Another
> machine with the same chipset is happily using MSIs.
I did the following tests overnight:
* 3.4.60 kernel:
Pass! [adhoc flight 19081]
* 3.10.10 + patch from Zoltan Kiss to limit SKB_FRAG_PAGE_ORDER
Subject: net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen
Date: Wed Sep 04 21:54:01 BST 2013
Message-ID: <1378327638-23956-1-git-send-email-zoltan.kiss@citrix.com>
Fail as before (in this case, timeout in debootstrap trying to
install a geust). [adhoc flight 19082]
* 3.10.10, kernel command line "pci=noacpi and pci=nocrs"
Total boot failure. SATA controller complaining bitterly about
lost interrupts. [adhoc flight 19085]
I also took woodlouse out of the main test pool, which is how we got a
push of 4.2. I'm going to put it back now, and make a change to
switch to Linux 3.4.y for general tests.
I think this gets the 3.10.y problem off the critical path for
everything else but of course we should still fix it. I will leave
the 3.10.y push gate in place.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson
@ 2013-09-06 10:49 ` Jan Beulich
2013-09-06 11:06 ` Ian Jackson
2013-09-06 10:58 ` David Vrabel
2013-09-06 12:57 ` Konrad Rzeszutek Wilk
2 siblings, 1 reply; 19+ messages in thread
From: Jan Beulich @ 2013-09-06 10:49 UTC (permalink / raw)
To: Ian Jackson
Cc: Keir Fraser, David Vrabel, xen-devel, Boris Ostrovsky,
Malcolm Crossley
>>> On 06.09.13 at 12:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
>> This looks like a red herring. Having poked about in woodlouse it looks
>> like something is screwy with interrupts. The tg3 cards aren't using
>> MSI and the USB controller is using edge not level handlers. Another
>> machine with the same chipset is happily using MSIs.
>
> I did the following tests overnight:
>
> * 3.4.60 kernel:
>
> Pass! [adhoc flight 19081]
>
> * 3.10.10 + patch from Zoltan Kiss to limit SKB_FRAG_PAGE_ORDER
> Subject: net/core: Order-3 frag allocator causes SWIOTLB bouncing under
> Xen
> Date: Wed Sep 04 21:54:01 BST 2013
> Message-ID: <1378327638-23956-1-git-send-email-zoltan.kiss@citrix.com>
>
> Fail as before (in this case, timeout in debootstrap trying to
> install a geust). [adhoc flight 19082]
>
> * 3.10.10, kernel command line "pci=noacpi and pci=nocrs"
>
> Total boot failure. SATA controller complaining bitterly about
> lost interrupts. [adhoc flight 19085]
>
> I also took woodlouse out of the main test pool, which is how we got a
> push of 4.2. I'm going to put it back now, and make a change to
> switch to Linux 3.4.y for general tests.
For -unstable this also resulted in just a single left test failure
(test-amd64-i386-pair 17 guest-migrate/src_host/dst_host),
which appears to be the result of the migration, after the
first few thousand pages, seeing a rapid decrease of speed
(which then likely causes that timeout). I couldn't spot anything
in the logs that would explain this though. But I did notice that
in two of the three runs there was not xend.log captured on
the source host in the first place - is there an explanation for
this?
In any event I'm going to take these almost-pushes as a
"good enough" sign to pull over the two or three commits into
the stable branches, in the expectation that we should be
able to get a push there over the weekend, and then release
early next week.
Looking through the logs of *-mite it also seems like you gave
3.11 a try, hitting a BUG() in balloon.c.
Jan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson
2013-09-06 10:49 ` Jan Beulich
@ 2013-09-06 10:58 ` David Vrabel
2013-09-06 11:50 ` Ian Jackson
2013-09-06 12:57 ` Konrad Rzeszutek Wilk
2 siblings, 1 reply; 19+ messages in thread
From: David Vrabel @ 2013-09-06 10:58 UTC (permalink / raw)
To: Ian Jackson
Cc: Keir Fraser, Jan Beulich, xen-devel, Boris Ostrovsky,
Malcolm Crossley
On 06/09/13 11:38, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
>> This looks like a red herring. Having poked about in woodlouse it looks
>> like something is screwy with interrupts. The tg3 cards aren't using
>> MSI and the USB controller is using edge not level handlers. Another
>> machine with the same chipset is happily using MSIs.
>
> I did the following tests overnight:
>
> * 3.4.60 kernel:
>
> Pass! [adhoc flight 19081]
Where are the logs for this run?
I tried:
http://www.chiark.greenend.org.uk/~xensrcts/logs/19081/
David
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 10:49 ` Jan Beulich
@ 2013-09-06 11:06 ` Ian Jackson
2013-09-06 12:49 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 19+ messages in thread
From: Ian Jackson @ 2013-09-06 11:06 UTC (permalink / raw)
To: Jan Beulich
Cc: Keir Fraser, David Vrabel, xen-devel, Boris Ostrovsky,
Malcolm Crossley
Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"):
> On 06.09.13 at 12:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> For -unstable this also resulted in just a single left test failure
> (test-amd64-i386-pair 17 guest-migrate/src_host/dst_host),
> which appears to be the result of the migration, after the
> first few thousand pages, seeing a rapid decrease of speed
> (which then likely causes that timeout). I couldn't spot anything
> in the logs that would explain this though. But I did notice that
> in two of the three runs there was not xend.log captured on
> the source host in the first place - is there an explanation for
> this?
Looking at the logs-capture log, it appears that itch-mite was totally
unresponsive by then. The log capture script decided to power cycle
it. After having done that, xend wasn't running. Due to a bug in the
script it didn't retry the log capture.
> In any event I'm going to take these almost-pushes as a
> "good enough" sign to pull over the two or three commits into
> the stable branches, in the expectation that we should be
> able to get a push there over the weekend, and then release
> early next week.
OK.
> Looking through the logs of *-mite it also seems like you gave
> 3.11 a try, hitting a BUG() in balloon.c.
That'll be the "linux-linus" test, which isn't doing very well.
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 10:58 ` David Vrabel
@ 2013-09-06 11:50 ` Ian Jackson
0 siblings, 0 replies; 19+ messages in thread
From: Ian Jackson @ 2013-09-06 11:50 UTC (permalink / raw)
To: David Vrabel
Cc: Keir Fraser, Jan Beulich, xen-devel, Boris Ostrovsky,
Malcolm Crossley
David Vrabel writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"):
> On 06/09/13 11:38, Ian Jackson wrote:
> > I did the following tests overnight:
> >
> > * 3.4.60 kernel:
> >
> > Pass! [adhoc flight 19081]
>
> Where are the logs for this run?
>
> I tried:
> http://www.chiark.greenend.org.uk/~xensrcts/logs/19081/
It doesn't automatically publish the logs of adhoc flights. I have
just done this now (for all three I mentioned).
Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 11:06 ` Ian Jackson
@ 2013-09-06 12:49 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-06 12:49 UTC (permalink / raw)
To: Ian Jackson, Stefano Stabellini
Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel,
Boris Ostrovsky, Malcolm Crossley
On Fri, Sep 06, 2013 at 12:06:38PM +0100, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"):
> > On 06.09.13 at 12:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> > For -unstable this also resulted in just a single left test failure
> > (test-amd64-i386-pair 17 guest-migrate/src_host/dst_host),
> > which appears to be the result of the migration, after the
> > first few thousand pages, seeing a rapid decrease of speed
> > (which then likely causes that timeout). I couldn't spot anything
> > in the logs that would explain this though. But I did notice that
> > in two of the three runs there was not xend.log captured on
> > the source host in the first place - is there an explanation for
> > this?
>
> Looking at the logs-capture log, it appears that itch-mite was totally
> unresponsive by then. The log capture script decided to power cycle
> it. After having done that, xend wasn't running. Due to a bug in the
> script it didn't retry the log capture.
>
> > In any event I'm going to take these almost-pushes as a
> > "good enough" sign to pull over the two or three commits into
> > the stable branches, in the expectation that we should be
> > able to get a push there over the weekend, and then release
> > early next week.
>
> OK.
>
> > Looking through the logs of *-mite it also seems like you gave
> > 3.11 a try, hitting a BUG() in balloon.c.
>
> That'll be the "linux-linus" test, which isn't doing very well.
I think Boris has a patch for that fixes the regression.
>
> Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson
2013-09-06 10:49 ` Jan Beulich
2013-09-06 10:58 ` David Vrabel
@ 2013-09-06 12:57 ` Konrad Rzeszutek Wilk
2013-09-06 13:34 ` Ian Jackson
2 siblings, 1 reply; 19+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-06 12:57 UTC (permalink / raw)
To: Ian Jackson
Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel,
Boris Ostrovsky, Malcolm Crossley
On Fri, Sep 06, 2013 at 11:38:42AM +0100, Ian Jackson wrote:
> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"):
> > This looks like a red herring. Having poked about in woodlouse it looks
> > like something is screwy with interrupts. The tg3 cards aren't using
> > MSI and the USB controller is using edge not level handlers. Another
> > machine with the same chipset is happily using MSIs.
>
> I did the following tests overnight:
>
> * 3.4.60 kernel:
>
> Pass! [adhoc flight 19081]
>
> * 3.10.10 + patch from Zoltan Kiss to limit SKB_FRAG_PAGE_ORDER
> Subject: net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen
> Date: Wed Sep 04 21:54:01 BST 2013
> Message-ID: <1378327638-23956-1-git-send-email-zoltan.kiss@citrix.com>
>
> Fail as before (in this case, timeout in debootstrap trying to
> install a geust). [adhoc flight 19082]
>
> * 3.10.10, kernel command line "pci=noacpi and pci=nocrs"
>
> Total boot failure. SATA controller complaining bitterly about
> lost interrupts. [adhoc flight 19085]
Somebody (Andrew? David?) took a look at the box and found that the MSIs
were all out of whack. I guess with the 'noacpi' parameter the thinking is
that the ACPI _PRT are out of whack with the more modern kernels?
I am not that familiar with oss-test - but is each of the set of boxes
running a different version of the hypervisor? Meaning you don't
randomly install from scratch a new version of a hypervisor on different
boxes?
Thanks!
>
> I also took woodlouse out of the main test pool, which is how we got a
> push of 4.2. I'm going to put it back now, and make a change to
> switch to Linux 3.4.y for general tests.
>
> I think this gets the 3.10.y problem off the critical path for
> everything else but of course we should still fix it. I will leave
> the 3.10.y push gate in place.
Aye. Is this issue (network incredibly slow) only surfacing on this box?
No - I thought I saw the issue on gall and lice with the upstream Linux?
Are those two machines the same as woodlouse?
>
> Ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]
2013-09-06 12:57 ` Konrad Rzeszutek Wilk
@ 2013-09-06 13:34 ` Ian Jackson
0 siblings, 0 replies; 19+ messages in thread
From: Ian Jackson @ 2013-09-06 13:34 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel,
Boris Ostrovsky, Malcolm Crossley
Konrad Rzeszutek Wilk writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"):
> I am not that familiar with oss-test - but is each of the set of boxes
> running a different version of the hypervisor? Meaning you don't
> randomly install from scratch a new version of a hypervisor on different
> boxes?
No, each test is of a specific version of the hypervisor, a specific
version of the kernel, etc.
For each test the tester will pick a machine from the test pool. The
scheduling algorithm tries to pick a machine which has not recently
run this test, unless the test failed most recently, in which case it
tries to pick (the) one it failed on.
Each test job involves a complete wipe of the system, and then
installing a dom0 OS with the selected hypervisor and kernel.
> > I think this gets the 3.10.y problem off the critical path for
> > everything else but of course we should still fix it. I will leave
> > the 3.10.y push gate in place.
>
> Aye. Is this issue (network incredibly slow) only surfacing on this box?
> No - I thought I saw the issue on gall and lice with the upstream Linux?
> Are those two machines the same as woodlouse?
No, they are entirely different. This incredibly slow network issue
has only been seen on woodlouse. Most of the machines are in
identical pairs, but not woodlouse, sadly.
ian.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-09-06 13:34 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-29 19:18 [xen-unstable test] 18851: regressions - FAIL xen.org
2013-08-30 10:36 ` Jan Beulich
2013-09-02 15:10 ` Ian Jackson
2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org
2013-09-02 17:09 ` [xen-unstable test] 18851: regressions - FAIL Ian Jackson
2013-09-02 17:15 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass [and 1 more messages] Ian Jackson
2013-09-04 9:04 ` [xen-unstable test] 18851: regressions - FAIL Jan Beulich
2013-09-04 10:41 ` Ian Jackson
2013-09-05 11:24 ` David Vrabel
2013-09-05 12:20 ` Jan Beulich
2013-09-05 14:09 ` David Vrabel
2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson
2013-09-06 10:49 ` Jan Beulich
2013-09-06 11:06 ` Ian Jackson
2013-09-06 12:49 ` Konrad Rzeszutek Wilk
2013-09-06 10:58 ` David Vrabel
2013-09-06 11:50 ` Ian Jackson
2013-09-06 12:57 ` Konrad Rzeszutek Wilk
2013-09-06 13:34 ` Ian Jackson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).