* [xen-unstable test] 18851: regressions - FAIL
@ 2013-08-29 19:18 xen.org
2013-08-30 10:36 ` Jan Beulich
2013-09-02 15:10 ` Ian Jackson
0 siblings, 2 replies; 19+ messages in thread
From: xen.org @ 2013-08-29 19:18 UTC (permalink / raw)
To: xen-devel; +Cc: ian.jackson
flight 18851 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778
test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778
test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778
Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass
test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass
test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass
test-amd64-amd64-xl-qemut-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass
test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-qemut-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-win7-amd64 13 guest-stop fail never pass
version targeted for testing:
xen fb3f1c1855bd9aca625bc0d040be4cdcc216e958
baseline version:
xen 8a7769b4453168e23e8935a85e9a875ef5117253
------------------------------------------------------------
People who touched revisions under test:
Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell <ian.campbell@citrix.com>
Ian Campbell <ijc@hellion.org.uk>
Ian Jackson <ian.jackson@eu.citrix.com>
Jaeyong Yoo <jaeyong.yoo@samsung.com>
Jan Beulich <jbeulich@suse.com>
Julien Grall <julien.grall@linaro.org>
Keir Fraser <keir@xen.org>
Matt Wilson <msw@amazon.com>
Sander Eikelenboom <linux@eikelenboom.it>
Suravee Suthikulpanit <suravee.suthikulapanit@amd.com>
Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
------------------------------------------------------------
jobs:
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-amd64-i386-xl pass
test-amd64-i386-rhel6hvm-amd fail
test-amd64-i386-qemut-rhel6hvm-amd pass
test-amd64-i386-qemuu-rhel6hvm-amd pass
test-amd64-amd64-xl-qemut-win7-amd64 fail
test-amd64-i386-xl-qemut-win7-amd64 fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 pass
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel pass
test-amd64-i386-qemut-rhel6hvm-intel pass
test-amd64-i386-qemuu-rhel6hvm-intel pass
test-amd64-i386-xl-multivcpu fail
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv fail
test-amd64-amd64-xl-sedf pass
test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-i386-xend-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemut-winxpsp3 fail
test-amd64-amd64-xl-qemuu-winxpsp3 fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images
Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs
Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Not pushing.
(No revision log; it would be 333 lines long.)
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [xen-unstable test] 18851: regressions - FAIL 2013-08-29 19:18 [xen-unstable test] 18851: regressions - FAIL xen.org @ 2013-08-30 10:36 ` Jan Beulich 2013-09-02 15:10 ` Ian Jackson 1 sibling, 0 replies; 19+ messages in thread From: Jan Beulich @ 2013-08-30 10:36 UTC (permalink / raw) To: ian.jackson, xen-devel >>> On 29.08.13 at 21:18, xen.org <ian.jackson@eu.citrix.com> wrote: > flight 18851 xen-unstable real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778 > test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778 > test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778 So these all appear to be timeouts of infrastructure operations that don't have an immediate explanation to me. The only odd thing is [ 12.719551] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 12.726458] IPv6: ADDRCONF(NETDEV_UP): xenbr0: link is not ready in each of the respective woodlouse---var-log-dmesg files. Is woodlouse suffering from a network connectivity issue, perhaps as a result of the kernel update? In any event, throughout the last runs it has - afaics - always been woodlouse that had failures (and the stickiness of failed tests then likely prevents them to ever get a success elsewhere). So perhaps worth trying to take woodlouse out of the pool temporarily? Jan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-08-29 19:18 [xen-unstable test] 18851: regressions - FAIL xen.org 2013-08-30 10:36 ` Jan Beulich @ 2013-09-02 15:10 ` Ian Jackson 2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org ` (2 more replies) 1 sibling, 3 replies; 19+ messages in thread From: Ian Jackson @ 2013-09-02 15:10 UTC (permalink / raw) To: xen-devel; +Cc: Boris Ostrovsky, Keir Fraser, David Vrabel, Jan Beulich xen.org writes ("[xen-unstable test] 18851: regressions - FAIL"): > flight 18851 xen-unstable real [real] > http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778 I have had a bisection report about this: From: "xen.org" <osstest@woking.cam.xci-test.com> From: "xen.org" <ian.jackson@eu.citrix.com> X-rewrote-sender: osstest@woking.cam.xci-test.com Date: Mon, 02 Sep 2013 14:33:30 +0100 branch xen-unstable xen branch xen-unstable job test-amd64-i386-qemut-rhel6hvm-amd test redhat-install Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git Tree: xen git://xenbits.xen.org/xen.git *** Found and reproduced problem changeset *** Bug is in tree: linux git://xenbits.xen.org/linux-pvops.git Bug introduced: 8bf3379a74bc9132751bfa685bad2da318fd59d7 Bug not present: a938a246d34912423c560f475ccf1ce0c71d9d00 commit 8bf3379a74bc9132751bfa685bad2da318fd59d7 Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Thu Aug 29 09:47:51 2013 -0700 Linux 3.10.10 [etc.] The head commit there is a merge. The email contained all the log messages in between those two, so bounced. (The bisector didn't examine the other parent of the merge, I think because it wasn't an ancestor of the baseline "good" revision.) I'm not sure why my osstest push gate didn't catch this, but the regression is indeed caused by the change from Jeremy's old tree to Linux 3.10.y. Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [xen-unstable test] 19006: regressions - trouble: broken/fail/pass @ 2013-09-02 17:02 ` xen.org 0 siblings, 0 replies; 19+ messages in thread From: xen.org @ 2013-09-02 17:02 UTC (permalink / raw) To: xen-devel; +Cc: ian.jackson flight 19006 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/19006/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778 test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778 test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778 test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 18778 test-amd64-i386-qemuu-rhel6hvm-amd 7 redhat-install fail in 18998 REGR. vs. 18778 Tests which are failing intermittently (not blocking): test-amd64-i386-qemuu-rhel6hvm-amd 6 leak-check/basis(6) fail pass in 18998 test-amd64-i386-rhel6hvm-amd 6 leak-check/basis(6) fail pass in 18998 test-amd64-i386-pv 6 leak-check/basis(6) fail in 18998 pass in 19006 Regressions which are regarded as allowable (not blocking): test-amd64-i386-rhel6hvm-amd 7 redhat-install fail in 18998 like 18920-bisect Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 13 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 13 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 13 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 16 leak-check/check fail never pass test-amd64-amd64-xl-qemut-win7-amd64 13 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 13 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 13 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 13 guest-stop fail never pass version targeted for testing: xen ec3f60c9d609703cce2fca30edbc6e72cd18e492 baseline version: xen 8a7769b4453168e23e8935a85e9a875ef5117253 ------------------------------------------------------------ People who touched revisions under test: Andrew Cooper <andrew.cooper3@citrix.com> Ian Campbell <ian.campbell@citrix.com> Ian Campbell <ijc@hellion.org.uk> Ian Jackson <ian.jackson@eu.citrix.com> Jaeyong Yoo <jaeyong.yoo@samsung.com> Jan Beulich <jbeulich@suse.com> Julien Grall <julien.grall@linaro.org> Keir Fraser <keir@xen.org> Len Brown <len.brown@intel.com> Matt Wilson <msw@amazon.com> Sander Eikelenboom <linux@eikelenboom.it> Suravee Suthikulpanit <suravee.suthikulapanit@amd.com> Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tim Deegan <tim@xen.org> Tomasz Wroblewski <tomasz.wroblewski@citrix.com> ------------------------------------------------------------ jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-pvops pass build-i386-pvops pass test-amd64-amd64-xl pass test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd broken test-amd64-i386-qemut-rhel6hvm-amd fail test-amd64-i386-qemuu-rhel6hvm-amd broken test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64 fail test-amd64-i386-xl-credit2 pass test-amd64-amd64-xl-pcipt-intel fail test-amd64-i386-rhel6hvm-intel pass test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-i386-xl-multivcpu fail test-amd64-amd64-pair pass test-amd64-i386-pair fail test-amd64-amd64-xl-sedf-pin pass test-amd64-amd64-pv pass test-amd64-i386-pv fail test-amd64-amd64-xl-sedf pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail test-amd64-i386-xl-winxpsp3-vcpus1 fail test-amd64-i386-xend-qemut-winxpsp3 fail test-amd64-amd64-xl-qemut-winxpsp3 fail test-amd64-amd64-xl-qemuu-winxpsp3 fail test-amd64-i386-xend-winxpsp3 fail test-amd64-amd64-xl-winxpsp3 fail ------------------------------------------------------------ sg-report-flight on woking.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 522 lines long.) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-09-02 15:10 ` Ian Jackson 2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org @ 2013-09-02 17:09 ` Ian Jackson 2013-09-02 17:15 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass [and 1 more messages] Ian Jackson 2013-09-04 9:04 ` [xen-unstable test] 18851: regressions - FAIL Jan Beulich 2 siblings, 1 reply; 19+ messages in thread From: Ian Jackson @ 2013-09-02 17:09 UTC (permalink / raw) To: xen-devel, David Vrabel, Jan Beulich, Boris Ostrovsky, Konrad Rzeszutek Wilk, Keir Fraser Ian Jackson writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): > xen.org writes ("[xen-unstable test] 18851: regressions - FAIL"): > > flight 18851 xen-unstable real [real] > > http://www.chiark.greenend.org.uk/~xensrcts/logs/18851/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-amd64-i386-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778 xen.org writes ("[xen-unstable test] 19006: regressions - trouble: broken/fail/pass"): > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778 I looked at this one from 19006. The system is running under Xen but has no guests. It shows a wget process running. I can't easily tell whether it has hung, but there are no other signs of trouble in the logs. The tester was able to ssh in and get process listings and so forth so it must be that (a) just the debootstrap stuff has hung (b) trying to ssh in to collect logs unwedged it (c) the problem is actually poor performance, not a hang. The system allows 2ks for a debootstrap, which should be ample (given that there's a local mirror). So I think it's probably a performance regression. I will try to repro this tomorrow. Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 19006: regressions - trouble: broken/fail/pass [and 1 more messages] 2013-09-02 17:09 ` [xen-unstable test] 18851: regressions - FAIL Ian Jackson @ 2013-09-02 17:15 ` Ian Jackson 0 siblings, 0 replies; 19+ messages in thread From: Ian Jackson @ 2013-09-02 17:15 UTC (permalink / raw) To: xen.org; +Cc: xen-devel, Keir Fraser, David Vrabel, Jan Beulich, Boris Ostrovsky xen.org writes ("[xen-unstable test] 19006: regressions - trouble: broken/fail/pass"): > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-qemut-rhel6hvm-amd 7 redhat-install fail REGR. vs. 18778 > test-amd64-i386-pv 7 debian-install fail REGR. vs. 18778 > test-amd64-i386-xl-multivcpu 7 debian-install fail REGR. vs. 18778 All these three were on woodlouse. > test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 18778 This was on gall-mite and itch-mite. > test-amd64-i386-qemuu-rhel6hvm-amd 7 redhat-install fail in 18998 REGR. vs. 18778 This was on woodlouse too (but the logs have been expired). Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-09-02 15:10 ` Ian Jackson 2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org 2013-09-02 17:09 ` [xen-unstable test] 18851: regressions - FAIL Ian Jackson @ 2013-09-04 9:04 ` Jan Beulich 2013-09-04 10:41 ` Ian Jackson 2 siblings, 1 reply; 19+ messages in thread From: Jan Beulich @ 2013-09-04 9:04 UTC (permalink / raw) To: David Vrabel, Ian Jackson, Boris Ostrovsky, Konrad Rzeszutek Wilk Cc: xen-devel, Keir Fraser >>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > *** Found and reproduced problem changeset *** > > Bug is in tree: linux git://xenbits.xen.org/linux-pvops.git > Bug introduced: 8bf3379a74bc9132751bfa685bad2da318fd59d7 > Bug not present: a938a246d34912423c560f475ccf1ce0c71d9d00 > > > commit 8bf3379a74bc9132751bfa685bad2da318fd59d7 > Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Date: Thu Aug 29 09:47:51 2013 -0700 > > Linux 3.10.10 > > [etc.] > > The head commit there is a merge. The email contained all the log > messages in between those two, so bounced. (The bisector didn't > examine the other parent of the merge, I think because it wasn't an > ancestor of the baseline "good" revision.) > > I'm not sure why my osstest push gate didn't catch this, but the > regression is indeed caused by the change from Jeremy's old tree to > Linux 3.10.y. So how do we want to deal with that? Linux maintainers - any chance you could help out? The staging tree having been stuck for over a week is certainly less than ideal... Jan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-09-04 9:04 ` [xen-unstable test] 18851: regressions - FAIL Jan Beulich @ 2013-09-04 10:41 ` Ian Jackson 2013-09-05 11:24 ` David Vrabel 0 siblings, 1 reply; 19+ messages in thread From: Ian Jackson @ 2013-09-04 10:41 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, Boris Ostrovsky, Keir Fraser, David Vrabel Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): > On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: ... > > I'm not sure why my osstest push gate didn't catch this, but the > > regression is indeed caused by the change from Jeremy's old tree to > > Linux 3.10.y. It appears that the push gate didn't catch it because it's host specific, and it got lucky and didn't run a test on that host. > So how do we want to deal with that? Linux maintainers - any > chance you could help out? The staging tree having been stuck > for over a week is certainly less than ideal... David Vrabel pointed out that more modern kernels have a different interpretation of things like "dom0_mem=256M", and can waste lots and lots of actual memory on pointless bookkeeping for future expansion (which the kernel envisages but we do not). I have changed it to "dom0_mem=256M,max:256M". I got a push of this change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the test runs yet reported have used this change. ... I have just checked the database and flights 19046 onwards are using this new command-line option. None of them have reported yet. In fact due to the backlog the system is rather clogged with runs using the old osstest. I'm going to manually kill those. Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-09-04 10:41 ` Ian Jackson @ 2013-09-05 11:24 ` David Vrabel 2013-09-05 12:20 ` Jan Beulich 0 siblings, 1 reply; 19+ messages in thread From: David Vrabel @ 2013-09-05 11:24 UTC (permalink / raw) To: Ian Jackson; +Cc: xen-devel, Boris Ostrovsky, Keir Fraser, Jan Beulich On 04/09/13 11:41, Ian Jackson wrote: > Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): >> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > ... >>> I'm not sure why my osstest push gate didn't catch this, but the >>> regression is indeed caused by the change from Jeremy's old tree to >>> Linux 3.10.y. > > It appears that the push gate didn't catch it because it's host > specific, and it got lucky and didn't run a test on that host. > >> So how do we want to deal with that? Linux maintainers - any >> chance you could help out? The staging tree having been stuck >> for over a week is certainly less than ideal... > > David Vrabel pointed out that more modern kernels have a different > interpretation of things like "dom0_mem=256M", and can waste lots and > lots of actual memory on pointless bookkeeping for future expansion > (which the kernel envisages but we do not). > > I have changed it to "dom0_mem=256M,max:256M". I got a push of this > change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the > test runs yet reported have used this change. Woodlouse's e820 as seen by the kernel looks like: [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable [ 0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable [ 0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9 [ 0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data [ 0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS [ 0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved [ 0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved [ 0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable [ 0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable [ 0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved That last reserved entry I think confuses the early setup and it does odd things like: [ 0.000000] Set 266338518 page(s) to 1-1 mapping Possibly relevant kernel thread here: http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html I note that the e820 as seen by Xen does not have this reserved region (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009a800 (usable) (XEN) 000000000009a800 - 00000000000a0000 (reserved) (XEN) 00000000000e6000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000d7f90000 (usable) (XEN) 00000000d7f9e000 - 00000000d7fa0000 type 9 (XEN) 00000000d7fa0000 - 00000000d7fae000 (ACPI data) (XEN) 00000000d7fae000 - 00000000d7fe0000 (ACPI NVS) (XEN) 00000000d7fe0000 - 00000000d7fee000 (reserved) (XEN) 00000000d7ff0000 - 00000000d8000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fec00000 - 00000000fec03000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ff700000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000228000000 (usable) So it must be being added by Xen? David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-09-05 11:24 ` David Vrabel @ 2013-09-05 12:20 ` Jan Beulich 2013-09-05 14:09 ` David Vrabel 0 siblings, 1 reply; 19+ messages in thread From: Jan Beulich @ 2013-09-05 12:20 UTC (permalink / raw) To: David Vrabel, Ian Jackson; +Cc: xen-devel, Boris Ostrovsky, Keir Fraser >>> On 05.09.13 at 13:24, David Vrabel <david.vrabel@citrix.com> wrote: > On 04/09/13 11:41, Ian Jackson wrote: >> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): >>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: >> ... >>>> I'm not sure why my osstest push gate didn't catch this, but the >>>> regression is indeed caused by the change from Jeremy's old tree to >>>> Linux 3.10.y. >> >> It appears that the push gate didn't catch it because it's host >> specific, and it got lucky and didn't run a test on that host. >> >>> So how do we want to deal with that? Linux maintainers - any >>> chance you could help out? The staging tree having been stuck >>> for over a week is certainly less than ideal... >> >> David Vrabel pointed out that more modern kernels have a different >> interpretation of things like "dom0_mem=256M", and can waste lots and >> lots of actual memory on pointless bookkeeping for future expansion >> (which the kernel envisages but we do not). >> >> I have changed it to "dom0_mem=256M,max:256M". I got a push of this >> change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the >> test runs yet reported have used this change. > > Woodlouse's e820 as seen by the kernel looks like: > > [ 0.000000] e820: BIOS-provided physical RAM map: > [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable > [ 0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved > [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable > [ 0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9 > [ 0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data > [ 0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS > [ 0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved > [ 0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved > [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved > [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved > [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved > [ 0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved > [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable > [ 0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable > [ 0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved > > That last reserved entry I think confuses the early setup and it does > odd things like: > > [ 0.000000] Set 266338518 page(s) to 1-1 mapping > > Possibly relevant kernel thread here: > > http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html > > I note that the e820 as seen by Xen does not have this reserved region > > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009a800 (usable) > (XEN) 000000000009a800 - 00000000000a0000 (reserved) > (XEN) 00000000000e6000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 00000000d7f90000 (usable) > (XEN) 00000000d7f9e000 - 00000000d7fa0000 type 9 > (XEN) 00000000d7fa0000 - 00000000d7fae000 (ACPI data) > (XEN) 00000000d7fae000 - 00000000d7fe0000 (ACPI NVS) > (XEN) 00000000d7fe0000 - 00000000d7fee000 (reserved) > (XEN) 00000000d7ff0000 - 00000000d8000000 (reserved) > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec03000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ff700000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 0000000228000000 (usable) > > So it must be being added by Xen? Yes - see d838ac25 ("x86: don't allow Dom0 access to the HT address range"). But that's the case on all AMD systems, and I thought it wasn't just woodlouse that's an AMD one - Ian? In any event - how can the kernel side code make _any_ assumptions on what is or is not in the E820 table? I've recently seen logs from a system where reserved (MMIO) blocks appear right below the 1Tb (or maybe it was even 16Tb) boundary, without Xen inserting them. I would certainly be willing to revert that patch for the time being if we have reasons to believe this helps, but only as long as it is clear that the kernel needs fixing, and that I'll want this back before 4.4 goes out. Do we have baseline (8a7769b4) test results including the new kernel, with part of it run on woodlouse? Jan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL 2013-09-05 12:20 ` Jan Beulich @ 2013-09-05 14:09 ` David Vrabel 2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson 0 siblings, 1 reply; 19+ messages in thread From: David Vrabel @ 2013-09-05 14:09 UTC (permalink / raw) To: Jan Beulich Cc: Keir Fraser, Ian Jackson, xen-devel, Boris Ostrovsky, Malcolm Crossley On 05/09/13 13:20, Jan Beulich wrote: >>>> On 05.09.13 at 13:24, David Vrabel <david.vrabel@citrix.com> wrote: >> On 04/09/13 11:41, Ian Jackson wrote: >>> Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): >>>> On 02.09.13 at 17:10, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: >>> ... >>>>> I'm not sure why my osstest push gate didn't catch this, but the >>>>> regression is indeed caused by the change from Jeremy's old tree to >>>>> Linux 3.10.y. >>> >>> It appears that the push gate didn't catch it because it's host >>> specific, and it got lucky and didn't run a test on that host. >>> >>>> So how do we want to deal with that? Linux maintainers - any >>>> chance you could help out? The staging tree having been stuck >>>> for over a week is certainly less than ideal... >>> >>> David Vrabel pointed out that more modern kernels have a different >>> interpretation of things like "dom0_mem=256M", and can waste lots and >>> lots of actual memory on pointless bookkeeping for future expansion >>> (which the kernel envisages but we do not). >>> >>> I have changed it to "dom0_mem=256M,max:256M". I got a push of this >>> change at "Wed, 4 Sep 2013 03:50:14 +0100". I don't think any of the >>> test runs yet reported have used this change. >> >> Woodlouse's e820 as seen by the kernel looks like: >> >> [ 0.000000] e820: BIOS-provided physical RAM map: >> [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable >> [ 0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved >> [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000d7f8ffff] usable >> [ 0.000000] Xen: [mem 0x00000000d7f9e000-0x00000000d7f9ffff] type 9 >> [ 0.000000] Xen: [mem 0x00000000d7fa0000-0x00000000d7fadfff] ACPI data >> [ 0.000000] Xen: [mem 0x00000000d7fae000-0x00000000d7fdffff] ACPI NVS >> [ 0.000000] Xen: [mem 0x00000000d7fe0000-0x00000000d7fedfff] reserved >> [ 0.000000] Xen: [mem 0x00000000d7ff0000-0x00000000d7ffffff] reserved >> [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved >> [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec02fff] reserved >> [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved >> [ 0.000000] Xen: [mem 0x00000000ff700000-0x00000000ffffffff] reserved >> [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001884d1fff] usable >> [ 0.000000] Xen: [mem 0x00000001884d2000-0x0000000227ffffff] unusable >> [ 0.000000] Xen: [mem 0x000000fd00000000-0x000000ffffffffff] reserved >> >> That last reserved entry I think confuses the early setup and it does >> odd things like: >> >> [ 0.000000] Set 266338518 page(s) to 1-1 mapping >> >> Possibly relevant kernel thread here: >> >> http://lkml.indiana.edu/hypermail/linux/kernel/1110.1/01213.html >> >> I note that the e820 as seen by Xen does not have this reserved region >> >> (XEN) Xen-e820 RAM map: >> (XEN) 0000000000000000 - 000000000009a800 (usable) >> (XEN) 000000000009a800 - 00000000000a0000 (reserved) >> (XEN) 00000000000e6000 - 0000000000100000 (reserved) >> (XEN) 0000000000100000 - 00000000d7f90000 (usable) >> (XEN) 00000000d7f9e000 - 00000000d7fa0000 type 9 >> (XEN) 00000000d7fa0000 - 00000000d7fae000 (ACPI data) >> (XEN) 00000000d7fae000 - 00000000d7fe0000 (ACPI NVS) >> (XEN) 00000000d7fe0000 - 00000000d7fee000 (reserved) >> (XEN) 00000000d7ff0000 - 00000000d8000000 (reserved) >> (XEN) 00000000e0000000 - 00000000f0000000 (reserved) >> (XEN) 00000000fec00000 - 00000000fec03000 (reserved) >> (XEN) 00000000fee00000 - 00000000fee01000 (reserved) >> (XEN) 00000000ff700000 - 0000000100000000 (reserved) >> (XEN) 0000000100000000 - 0000000228000000 (usable) >> >> So it must be being added by Xen? > > Yes - see d838ac25 ("x86: don't allow Dom0 access to the HT > address range"). But that's the case on all AMD systems, and > I thought it wasn't just woodlouse that's an AMD one - Ian? > > In any event - how can the kernel side code make _any_ > assumptions on what is or is not in the E820 table? I've > recently seen logs from a system where reserved (MMIO) > blocks appear right below the 1Tb (or maybe it was even 16Tb) > boundary, without Xen inserting them. > > I would certainly be willing to revert that patch for the time > being if we have reasons to believe this helps, but only as long > as it is clear that the kernel needs fixing, and that I'll want this > back before 4.4 goes out. Do we have baseline (8a7769b4) > test results including the new kernel, with part of it run on > woodlouse? This looks like a red herring. Having poked about in woodlouse it looks like something is screwy with interrupts. The tg3 cards aren't using MSI and the USB controller is using edge not level handlers. Another machine with the same chipset is happily using MSIs. Malcolm (Cc) has some suggestions for things to try. David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-05 14:09 ` David Vrabel @ 2013-09-06 10:38 ` Ian Jackson 2013-09-06 10:49 ` Jan Beulich ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Ian Jackson @ 2013-09-06 10:38 UTC (permalink / raw) To: David Vrabel, Jan Beulich Cc: xen-devel, Boris Ostrovsky, Keir Fraser, Malcolm Crossley Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): > This looks like a red herring. Having poked about in woodlouse it looks > like something is screwy with interrupts. The tg3 cards aren't using > MSI and the USB controller is using edge not level handlers. Another > machine with the same chipset is happily using MSIs. I did the following tests overnight: * 3.4.60 kernel: Pass! [adhoc flight 19081] * 3.10.10 + patch from Zoltan Kiss to limit SKB_FRAG_PAGE_ORDER Subject: net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen Date: Wed Sep 04 21:54:01 BST 2013 Message-ID: <1378327638-23956-1-git-send-email-zoltan.kiss@citrix.com> Fail as before (in this case, timeout in debootstrap trying to install a geust). [adhoc flight 19082] * 3.10.10, kernel command line "pci=noacpi and pci=nocrs" Total boot failure. SATA controller complaining bitterly about lost interrupts. [adhoc flight 19085] I also took woodlouse out of the main test pool, which is how we got a push of 4.2. I'm going to put it back now, and make a change to switch to Linux 3.4.y for general tests. I think this gets the 3.10.y problem off the critical path for everything else but of course we should still fix it. I will leave the 3.10.y push gate in place. Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson @ 2013-09-06 10:49 ` Jan Beulich 2013-09-06 11:06 ` Ian Jackson 2013-09-06 10:58 ` David Vrabel 2013-09-06 12:57 ` Konrad Rzeszutek Wilk 2 siblings, 1 reply; 19+ messages in thread From: Jan Beulich @ 2013-09-06 10:49 UTC (permalink / raw) To: Ian Jackson Cc: Keir Fraser, David Vrabel, xen-devel, Boris Ostrovsky, Malcolm Crossley >>> On 06.09.13 at 12:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): >> This looks like a red herring. Having poked about in woodlouse it looks >> like something is screwy with interrupts. The tg3 cards aren't using >> MSI and the USB controller is using edge not level handlers. Another >> machine with the same chipset is happily using MSIs. > > I did the following tests overnight: > > * 3.4.60 kernel: > > Pass! [adhoc flight 19081] > > * 3.10.10 + patch from Zoltan Kiss to limit SKB_FRAG_PAGE_ORDER > Subject: net/core: Order-3 frag allocator causes SWIOTLB bouncing under > Xen > Date: Wed Sep 04 21:54:01 BST 2013 > Message-ID: <1378327638-23956-1-git-send-email-zoltan.kiss@citrix.com> > > Fail as before (in this case, timeout in debootstrap trying to > install a geust). [adhoc flight 19082] > > * 3.10.10, kernel command line "pci=noacpi and pci=nocrs" > > Total boot failure. SATA controller complaining bitterly about > lost interrupts. [adhoc flight 19085] > > I also took woodlouse out of the main test pool, which is how we got a > push of 4.2. I'm going to put it back now, and make a change to > switch to Linux 3.4.y for general tests. For -unstable this also resulted in just a single left test failure (test-amd64-i386-pair 17 guest-migrate/src_host/dst_host), which appears to be the result of the migration, after the first few thousand pages, seeing a rapid decrease of speed (which then likely causes that timeout). I couldn't spot anything in the logs that would explain this though. But I did notice that in two of the three runs there was not xend.log captured on the source host in the first place - is there an explanation for this? In any event I'm going to take these almost-pushes as a "good enough" sign to pull over the two or three commits into the stable branches, in the expectation that we should be able to get a push there over the weekend, and then release early next week. Looking through the logs of *-mite it also seems like you gave 3.11 a try, hitting a BUG() in balloon.c. Jan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 10:49 ` Jan Beulich @ 2013-09-06 11:06 ` Ian Jackson 2013-09-06 12:49 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 19+ messages in thread From: Ian Jackson @ 2013-09-06 11:06 UTC (permalink / raw) To: Jan Beulich Cc: Keir Fraser, David Vrabel, xen-devel, Boris Ostrovsky, Malcolm Crossley Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"): > On 06.09.13 at 12:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > For -unstable this also resulted in just a single left test failure > (test-amd64-i386-pair 17 guest-migrate/src_host/dst_host), > which appears to be the result of the migration, after the > first few thousand pages, seeing a rapid decrease of speed > (which then likely causes that timeout). I couldn't spot anything > in the logs that would explain this though. But I did notice that > in two of the three runs there was not xend.log captured on > the source host in the first place - is there an explanation for > this? Looking at the logs-capture log, it appears that itch-mite was totally unresponsive by then. The log capture script decided to power cycle it. After having done that, xend wasn't running. Due to a bug in the script it didn't retry the log capture. > In any event I'm going to take these almost-pushes as a > "good enough" sign to pull over the two or three commits into > the stable branches, in the expectation that we should be > able to get a push there over the weekend, and then release > early next week. OK. > Looking through the logs of *-mite it also seems like you gave > 3.11 a try, hitting a BUG() in balloon.c. That'll be the "linux-linus" test, which isn't doing very well. Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 11:06 ` Ian Jackson @ 2013-09-06 12:49 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 19+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-09-06 12:49 UTC (permalink / raw) To: Ian Jackson, Stefano Stabellini Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel, Boris Ostrovsky, Malcolm Crossley On Fri, Sep 06, 2013 at 12:06:38PM +0100, Ian Jackson wrote: > Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"): > > On 06.09.13 at 12:38, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > > For -unstable this also resulted in just a single left test failure > > (test-amd64-i386-pair 17 guest-migrate/src_host/dst_host), > > which appears to be the result of the migration, after the > > first few thousand pages, seeing a rapid decrease of speed > > (which then likely causes that timeout). I couldn't spot anything > > in the logs that would explain this though. But I did notice that > > in two of the three runs there was not xend.log captured on > > the source host in the first place - is there an explanation for > > this? > > Looking at the logs-capture log, it appears that itch-mite was totally > unresponsive by then. The log capture script decided to power cycle > it. After having done that, xend wasn't running. Due to a bug in the > script it didn't retry the log capture. > > > In any event I'm going to take these almost-pushes as a > > "good enough" sign to pull over the two or three commits into > > the stable branches, in the expectation that we should be > > able to get a push there over the weekend, and then release > > early next week. > > OK. > > > Looking through the logs of *-mite it also seems like you gave > > 3.11 a try, hitting a BUG() in balloon.c. > > That'll be the "linux-linus" test, which isn't doing very well. I think Boris has a patch for that fixes the regression. > > Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson 2013-09-06 10:49 ` Jan Beulich @ 2013-09-06 10:58 ` David Vrabel 2013-09-06 11:50 ` Ian Jackson 2013-09-06 12:57 ` Konrad Rzeszutek Wilk 2 siblings, 1 reply; 19+ messages in thread From: David Vrabel @ 2013-09-06 10:58 UTC (permalink / raw) To: Ian Jackson Cc: Keir Fraser, Jan Beulich, xen-devel, Boris Ostrovsky, Malcolm Crossley On 06/09/13 11:38, Ian Jackson wrote: > Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): >> This looks like a red herring. Having poked about in woodlouse it looks >> like something is screwy with interrupts. The tg3 cards aren't using >> MSI and the USB controller is using edge not level handlers. Another >> machine with the same chipset is happily using MSIs. > > I did the following tests overnight: > > * 3.4.60 kernel: > > Pass! [adhoc flight 19081] Where are the logs for this run? I tried: http://www.chiark.greenend.org.uk/~xensrcts/logs/19081/ David ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 10:58 ` David Vrabel @ 2013-09-06 11:50 ` Ian Jackson 0 siblings, 0 replies; 19+ messages in thread From: Ian Jackson @ 2013-09-06 11:50 UTC (permalink / raw) To: David Vrabel Cc: Keir Fraser, Jan Beulich, xen-devel, Boris Ostrovsky, Malcolm Crossley David Vrabel writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"): > On 06/09/13 11:38, Ian Jackson wrote: > > I did the following tests overnight: > > > > * 3.4.60 kernel: > > > > Pass! [adhoc flight 19081] > > Where are the logs for this run? > > I tried: > http://www.chiark.greenend.org.uk/~xensrcts/logs/19081/ It doesn't automatically publish the logs of adhoc flights. I have just done this now (for all three I mentioned). Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson 2013-09-06 10:49 ` Jan Beulich 2013-09-06 10:58 ` David Vrabel @ 2013-09-06 12:57 ` Konrad Rzeszutek Wilk 2013-09-06 13:34 ` Ian Jackson 2 siblings, 1 reply; 19+ messages in thread From: Konrad Rzeszutek Wilk @ 2013-09-06 12:57 UTC (permalink / raw) To: Ian Jackson Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel, Boris Ostrovsky, Malcolm Crossley On Fri, Sep 06, 2013 at 11:38:42AM +0100, Ian Jackson wrote: > Jan Beulich writes ("Re: [xen-unstable test] 18851: regressions - FAIL"): > > This looks like a red herring. Having poked about in woodlouse it looks > > like something is screwy with interrupts. The tg3 cards aren't using > > MSI and the USB controller is using edge not level handlers. Another > > machine with the same chipset is happily using MSIs. > > I did the following tests overnight: > > * 3.4.60 kernel: > > Pass! [adhoc flight 19081] > > * 3.10.10 + patch from Zoltan Kiss to limit SKB_FRAG_PAGE_ORDER > Subject: net/core: Order-3 frag allocator causes SWIOTLB bouncing under Xen > Date: Wed Sep 04 21:54:01 BST 2013 > Message-ID: <1378327638-23956-1-git-send-email-zoltan.kiss@citrix.com> > > Fail as before (in this case, timeout in debootstrap trying to > install a geust). [adhoc flight 19082] > > * 3.10.10, kernel command line "pci=noacpi and pci=nocrs" > > Total boot failure. SATA controller complaining bitterly about > lost interrupts. [adhoc flight 19085] Somebody (Andrew? David?) took a look at the box and found that the MSIs were all out of whack. I guess with the 'noacpi' parameter the thinking is that the ACPI _PRT are out of whack with the more modern kernels? I am not that familiar with oss-test - but is each of the set of boxes running a different version of the hypervisor? Meaning you don't randomly install from scratch a new version of a hypervisor on different boxes? Thanks! > > I also took woodlouse out of the main test pool, which is how we got a > push of 4.2. I'm going to put it back now, and make a change to > switch to Linux 3.4.y for general tests. > > I think this gets the 3.10.y problem off the critical path for > everything else but of course we should still fix it. I will leave > the 3.10.y push gate in place. Aye. Is this issue (network incredibly slow) only surfacing on this box? No - I thought I saw the issue on gall and lice with the upstream Linux? Are those two machines the same as woodlouse? > > Ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] 2013-09-06 12:57 ` Konrad Rzeszutek Wilk @ 2013-09-06 13:34 ` Ian Jackson 0 siblings, 0 replies; 19+ messages in thread From: Ian Jackson @ 2013-09-06 13:34 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Keir Fraser, David Vrabel, Jan Beulich, xen-devel, Boris Ostrovsky, Malcolm Crossley Konrad Rzeszutek Wilk writes ("Re: [xen-unstable test] 18851: regressions - FAIL [and 1 more messages]"): > I am not that familiar with oss-test - but is each of the set of boxes > running a different version of the hypervisor? Meaning you don't > randomly install from scratch a new version of a hypervisor on different > boxes? No, each test is of a specific version of the hypervisor, a specific version of the kernel, etc. For each test the tester will pick a machine from the test pool. The scheduling algorithm tries to pick a machine which has not recently run this test, unless the test failed most recently, in which case it tries to pick (the) one it failed on. Each test job involves a complete wipe of the system, and then installing a dom0 OS with the selected hypervisor and kernel. > > I think this gets the 3.10.y problem off the critical path for > > everything else but of course we should still fix it. I will leave > > the 3.10.y push gate in place. > > Aye. Is this issue (network incredibly slow) only surfacing on this box? > No - I thought I saw the issue on gall and lice with the upstream Linux? > Are those two machines the same as woodlouse? No, they are entirely different. This incredibly slow network issue has only been seen on woodlouse. Most of the machines are in identical pairs, but not woodlouse, sadly. ian. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-09-06 13:34 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-29 19:18 [xen-unstable test] 18851: regressions - FAIL xen.org 2013-08-30 10:36 ` Jan Beulich 2013-09-02 15:10 ` Ian Jackson 2013-09-02 17:02 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass xen.org 2013-09-02 17:09 ` [xen-unstable test] 18851: regressions - FAIL Ian Jackson 2013-09-02 17:15 ` [xen-unstable test] 19006: regressions - trouble: broken/fail/pass [and 1 more messages] Ian Jackson 2013-09-04 9:04 ` [xen-unstable test] 18851: regressions - FAIL Jan Beulich 2013-09-04 10:41 ` Ian Jackson 2013-09-05 11:24 ` David Vrabel 2013-09-05 12:20 ` Jan Beulich 2013-09-05 14:09 ` David Vrabel 2013-09-06 10:38 ` [xen-unstable test] 18851: regressions - FAIL [and 1 more messages] Ian Jackson 2013-09-06 10:49 ` Jan Beulich 2013-09-06 11:06 ` Ian Jackson 2013-09-06 12:49 ` Konrad Rzeszutek Wilk 2013-09-06 10:58 ` David Vrabel 2013-09-06 11:50 ` Ian Jackson 2013-09-06 12:57 ` Konrad Rzeszutek Wilk 2013-09-06 13:34 ` Ian Jackson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.