* [libvirt test] 55257: regressions - FAIL
@ 2015-05-11 12:46 osstest service user
2015-05-11 13:22 ` Ian Campbell
0 siblings, 1 reply; 15+ messages in thread
From: osstest service user @ 2015-05-11 12:46 UTC (permalink / raw)
To: xen-devel; +Cc: ian.jackson
flight 55257 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/55257/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854
Tests which did not succeed, but are not blocking:
test-amd64-i386-libvirt-xsm 11 guest-start fail never pass
test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass
test-armhf-armhf-libvirt-xsm 6 xen-boot fail never pass
test-amd64-i386-libvirt 12 migrate-support-check fail never pass
test-armhf-armhf-libvirt 12 migrate-support-check fail never pass
version targeted for testing:
libvirt 8910e063dbafc09695b2100c80213be569abb7ef
baseline version:
libvirt fd74e231751334b64af0934b680c5cc62f652453
------------------------------------------------------------
People who touched revisions under test:
Cole Robinson <crobinso@redhat.com>
------------------------------------------------------------
jobs:
build-amd64-xsm pass
build-armhf-xsm pass
build-i386-xsm pass
build-amd64 pass
build-armhf pass
build-i386 pass
build-amd64-libvirt pass
build-armhf-libvirt pass
build-i386-libvirt pass
build-amd64-pvops pass
build-armhf-pvops pass
build-i386-pvops pass
test-amd64-amd64-libvirt-xsm fail
test-armhf-armhf-libvirt-xsm fail
test-amd64-i386-libvirt-xsm fail
test-amd64-amd64-libvirt fail
test-armhf-armhf-libvirt pass
test-amd64-i386-libvirt pass
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/osstest/pub/logs
images: /home/osstest/pub/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
Not pushing.
------------------------------------------------------------
commit 8910e063dbafc09695b2100c80213be569abb7ef
Author: Cole Robinson <crobinso@redhat.com>
Date: Wed May 6 18:32:05 2015 -0400
caps: Fix regression defaulting to host arch
My commit 747761a79 (v1.2.15 only) dropped this bit of logic when filling
in a default arch in the XML:
- /* First try to find one matching host arch */
- for (i = 0; i < caps->nguests; i++) {
- if (caps->guests[i]->ostype == ostype) {
- for (j = 0; j < caps->guests[i]->arch.ndomains; j++) {
- if (caps->guests[i]->arch.domains[j]->type == domain &&
- caps->guests[i]->arch.id == caps->host.arch)
- return caps->guests[i]->arch.id;
- }
- }
- }
That attempt to match host.arch is important, otherwise we end up
defaulting to i686 on x86_64 host for KVM, which is not intended.
Duplicate it in the centralized CapsLookup function.
Additionally add some testcases that would have caught this.
https://bugzilla.redhat.com/show_bug.cgi?id=1219191
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [libvirt test] 55257: regressions - FAIL 2015-05-11 12:46 [libvirt test] 55257: regressions - FAIL osstest service user @ 2015-05-11 13:22 ` Ian Campbell 2015-05-11 16:36 ` Jim Fehlig 0 siblings, 1 reply; 15+ messages in thread From: Ian Campbell @ 2015-05-11 13:22 UTC (permalink / raw) To: xen-devel, ian.jackson, Jim Fehlig On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote: > flight 55257 libvirt real [real] > http://logs.test-lab.xenproject.org/osstest/logs/55257/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854 I fear this may be a new heisenbug. I fear a heisenbug because flight 53854 passed and there is only one more, completely unrelated change here. I saw something similar in http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an osstest flight against itself (so not posted to the list). That one had: > test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 53073 > test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53073 In that case the range of libvirt was more useful than the one commit here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not sure if 225aa80246d5 was OK or not http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libxl-libxl-driver.log ends with: libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda2 spec.backend=qdisk libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd10750: deregister unregistered libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda1 spec.backend=qdisk libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979ccdd370: deregister unregistered libxl: debug: libxl_dm.c:1487:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: /usr/local/lib/xen/bin/qemu-system-i386 libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-domid libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 1 libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -no-shutdown libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxl-cmd,mode=control libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxenstat-cmd,mode=control libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nodefaults libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-attach libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -name libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: debian.guest.osstest libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -vnc libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -display libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nographic libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -machine libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: xenpv libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -m libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 512 libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3 libxl: debug: libxl_create.c:1549:do_domain_create: ao 0x7f979cce1b10: inprogress: poller=0x7f979ccdd260, flags=i libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state) libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3 libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered libxl: error: libxl_dm.c:1558:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3) libxl: error: libxl_create.c:1351:domcreate_devmodel_started: device model did not start: -3 libxl: debug: libxl_dm.c:1671:kill_device_model: Device Model signaled libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd12ce0: deregister unregistered libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd13010: deregister unregistered libxl: info: libxl.c:1701:devices_destroy_cb: forked pid 18657 for destroy of domain 1 libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x7f979cce1b10: complete, rc=-3 The qemu log is sadly empty so I've no clue why this timed out. Perhaps there is something in http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz I can't make heads nor tail though. > Tests which did not succeed, but are not blocking: > test-amd64-i386-libvirt-xsm 11 guest-start fail never pass > test-amd64-amd64-libvirt-xsm 11 guest-start fail never pass > test-armhf-armhf-libvirt-xsm 6 xen-boot fail never pass > test-amd64-i386-libvirt 12 migrate-support-check fail never pass > test-armhf-armhf-libvirt 12 migrate-support-check fail never pass > > version targeted for testing: > libvirt 8910e063dbafc09695b2100c80213be569abb7ef > baseline version: > libvirt fd74e231751334b64af0934b680c5cc62f652453 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-11 13:22 ` Ian Campbell @ 2015-05-11 16:36 ` Jim Fehlig 2015-05-11 17:02 ` Ian Campbell 2015-05-13 8:46 ` Ian Campbell 0 siblings, 2 replies; 15+ messages in thread From: Jim Fehlig @ 2015-05-11 16:36 UTC (permalink / raw) To: Ian Campbell; +Cc: xen-devel, ian.jackson Ian Campbell wrote: > On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote: > >> flight 55257 libvirt real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/55257/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854 >> > > I fear this may be a new heisenbug. > > I fear a heisenbug because flight 53854 passed and there is only one > more, completely unrelated change here. > > I saw something similar in > http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an > osstest flight against itself (so not posted to the list). That one had: > >> test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 53073 >> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53073 >> > > In that case the range of libvirt was more useful than the one commit > here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not > sure if 225aa80246d5 was OK or not > 225aa80246d5 only touches the qemu driver and should not affect Xen. > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libxl-libxl-driver.log ends with: > > libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda2 spec.backend=qdisk > libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd10750: deregister unregistered > libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda1 spec.backend=qdisk > libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979ccdd370: deregister unregistered > libxl: debug: libxl_dm.c:1487:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: /usr/local/lib/xen/bin/qemu-system-i386 > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-domid > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 1 > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -no-shutdown > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxl-cmd,mode=control > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -chardev > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -mon > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: chardev=libxenstat-cmd,mode=control > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nodefaults > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -xen-attach > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -name > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: debian.guest.osstest > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -vnc > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -display > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: none > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -nographic > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -machine > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: xenpv > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: -m > libxl: debug: libxl_dm.c:1489:libxl__spawn_local_dm: 512 > libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3 > libxl: debug: libxl_create.c:1549:do_domain_create: ao 0x7f979cce1b10: inprogress: poller=0x7f979ccdd260, flags=i > libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state > libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state) > libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3 > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out > libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered > libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd0f430: deregister unregistered > libxl: error: libxl_dm.c:1558:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3) > libxl: error: libxl_create.c:1351:domcreate_devmodel_started: device model did not start: -3 > libxl: debug: libxl_dm.c:1671:kill_device_model: Device Model signaled > libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd12ce0: deregister unregistered > libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x7f979cd13010: deregister unregistered > libxl: info: libxl.c:1701:devices_destroy_cb: forked pid 18657 for destroy of domain 1 > libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x7f979cce1b10: complete, rc=-3 > > The qemu log is sadly empty so I've no clue why this timed out. > I guess qemu didn't run at all... > Perhaps there is something in > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz > I can't make heads nor tail though. > Nothing interesting. Only the unhelpful 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest' Off topic, but I'd really like to improve reporting of libxl errors in libvirt. Currently, when calls to libxl_foo fail, libvirt simply reports something like "libxenlight failed foo". Users must resort to /var/log/libvirt/libxl/libxl-driver.log and /var/log/xen/qemu-dm-<domname>.log for details. Perhaps a topic for the dev summit. Regards, Jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-11 16:36 ` Jim Fehlig @ 2015-05-11 17:02 ` Ian Campbell 2015-05-13 8:46 ` Ian Campbell 1 sibling, 0 replies; 15+ messages in thread From: Ian Campbell @ 2015-05-11 17:02 UTC (permalink / raw) To: Jim Fehlig; +Cc: xen-devel, ian.jackson On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote: > Ian Campbell wrote: > > On Mon, 2015-05-11 at 12:46 +0000, osstest service user wrote: > > > >> flight 55257 libvirt real [real] > >> http://logs.test-lab.xenproject.org/osstest/logs/55257/ > >> > >> Regressions :-( > >> > >> Tests which did not succeed and are blocking, > >> including tests which could not be run: > >> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53854 > >> > > > > I fear this may be a new heisenbug. > > > > I fear a heisenbug because flight 53854 passed and there is only one > > more, completely unrelated change here. > > > > I saw something similar in > > http://logs.test-lab.xenproject.org/osstest/logs/53721/ which was an > > osstest flight against itself (so not posted to the list). That one had: > > > >> test-amd64-i386-libvirt 11 guest-start fail REGR. vs. 53073 > >> test-amd64-amd64-libvirt 11 guest-start fail REGR. vs. 53073 > >> > > > > In that case the range of libvirt was more useful than the one commit > > here. It was 225aa80246d5..63a368012df, FWIW. Being a heisenbug I'm not > > sure if 225aa80246d5 was OK or not > > > > 225aa80246d5 only touches the qemu driver and should not affect Xen. I meant the state of the tree at that point rather than that commit itself. Ian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-11 16:36 ` Jim Fehlig 2015-05-11 17:02 ` Ian Campbell @ 2015-05-13 8:46 ` Ian Campbell 2015-05-13 17:46 ` Anthony PERARD 1 sibling, 1 reply; 15+ messages in thread From: Ian Campbell @ 2015-05-13 8:46 UTC (permalink / raw) To: Jim Fehlig; +Cc: xen-devel, ian.jackson On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote: [...] > > The qemu log is sadly empty so I've no clue why this timed out. > > > > I guess qemu didn't run at all... > > > Perhaps there is something in > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz > > I can't make heads nor tail though. > > > > Nothing interesting. Only the unhelpful > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 : > internal error: libxenlight failed to create new domain > 'debian.guest.osstest' This happened again in http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html Is there anything we could tweak in osstest to produce more helpful logging? > Off topic, but I'd really like to improve reporting of libxl errors in > libvirt. Currently, when calls to libxl_foo fail, libvirt simply > reports something like "libxenlight failed foo". Users must resort to > /var/log/libvirt/libxl/libxl-driver.log and > /var/log/xen/qemu-dm-<domname>.log for details. Perhaps a topic for the > dev summit. Indeed. One thing we would like to do is to have more specific error codes so that ERROR_FAIL is not returned everywhere. The xapi guys would like this too. In general we are happy to have error codes which are used for exactly one specific type of failure and to take patches to switch things from ERROR_FAIL to use something more meaningful. Other ideas: A logger which, as well as logging, would cache the last N messages of a certain priority or higher, in such a way that the caller could query them and output them. If the priority was >= ERROR I think that would on most failures get you the most relevant things. I wonder if it would even be possible to buffer up all of the calls to a given libxl_* entry point, in such a way that the messages associated with exactly that call could be retrieved. If we could find a way to integrate that with, say, the GC_INIT infrastructure then we would get it for free almost everywhere (not sure how recursive calls to libxl_* rather than libxl__* would be handled there). Ian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-13 8:46 ` Ian Campbell @ 2015-05-13 17:46 ` Anthony PERARD 2015-05-14 10:47 ` Ian Campbell 0 siblings, 1 reply; 15+ messages in thread From: Anthony PERARD @ 2015-05-13 17:46 UTC (permalink / raw) To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote: > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote: > [...] > > > The qemu log is sadly empty so I've no clue why this timed out. > > > > > > > I guess qemu didn't run at all... > > > > > Perhaps there is something in > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz > > > I can't make heads nor tail though. > > > > > > > Nothing interesting. Only the unhelpful > > > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 : > > internal error: libxenlight failed to create new domain > > 'debian.guest.osstest' > > This happened again in > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html > > Is there anything we could tweak in osstest to produce more helpful > logging? Well we can find in var-log-libvirt-libvirtd.log.gz this: 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest' And for more information we need to look into the driver specific log, libxl logs in var-log-libvirt-libxl-libxl-driver.log: libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out I'm seeing this error a lot on our OpenStack CI loop, I thought the error was due to the "host" been very busy, but if osstest is having the same issue, then there is probably something wrong with libxl+libvirt :(. -- Anthony PERARD ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-13 17:46 ` Anthony PERARD @ 2015-05-14 10:47 ` Ian Campbell 2015-05-14 11:07 ` Anthony PERARD 2015-05-14 21:21 ` Jim Fehlig 0 siblings, 2 replies; 15+ messages in thread From: Ian Campbell @ 2015-05-14 10:47 UTC (permalink / raw) To: Anthony PERARD; +Cc: Jim Fehlig, xen-devel, ian.jackson On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote: > On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote: > > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote: > > [...] > > > > The qemu log is sadly empty so I've no clue why this timed out. > > > > > > > > > > I guess qemu didn't run at all... > > > > > > > Perhaps there is something in > > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz > > > > I can't make heads nor tail though. > > > > > > > > > > Nothing interesting. Only the unhelpful > > > > > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 : > > > internal error: libxenlight failed to create new domain > > > 'debian.guest.osstest' > > > > This happened again in > > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html > > > > Is there anything we could tweak in osstest to produce more helpful > > logging? > > Well we can find in var-log-libvirt-libvirtd.log.gz this: > 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest' > > And for more information we need to look into the driver specific log, > libxl logs in var-log-libvirt-libxl-libxl-driver.log: > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out Thanks, all of that was mentioned earlier in the thread too, I was looking for ways to get more info. > I'm seeing this error a lot on our OpenStack CI loop, I thought the error > was due to the "host" been very busy, but if osstest is having the same > issue, then there is probably something wrong with libxl+libvirt :(. Are you able to reproduce at will or is it like osstest and just a sporadic failure? I suppose the openstack CI loop doesn't capture anything more interesting than osstest does? FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to have two more instances of this (amd64 and i386), but with no interesting logs still and a different one on ARM: http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log: 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml error: internal error: received hangup / error event on socket In that case the the libxl-driver log ends with: libxl: debug: libxl_dm.c:1495:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: [...] libxl: debug: libxl_event.c:600:libxl__ev_xswatch_register: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3 libxl: debug: libxl_create.c:1560:do_domain_create: ao 0xb2e044f0: inprogress: poller=0xb2e07590, flags=i libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state Which I don't think is complete, i.e. there should be more? Not sure if this gives a hint for the x86 case too? I don't see anything useful in http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/arndale-lakeside---var-log-libvirt-libvirtd.log.gz Ian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-14 10:47 ` Ian Campbell @ 2015-05-14 11:07 ` Anthony PERARD 2015-05-14 21:27 ` Jim Fehlig 2015-05-14 21:21 ` Jim Fehlig 1 sibling, 1 reply; 15+ messages in thread From: Anthony PERARD @ 2015-05-14 11:07 UTC (permalink / raw) To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson On Thu, May 14, 2015 at 11:47:18AM +0100, Ian Campbell wrote: > On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote: > > On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote: > > > On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote: > > > [...] > > > > > The qemu log is sadly empty so I've no clue why this timed out. > > > > > > > > > > > > > I guess qemu didn't run at all... > > > > > > > > > Perhaps there is something in > > > > > http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz > > > > > I can't make heads nor tail though. > > > > > > > > > > > > > Nothing interesting. Only the unhelpful > > > > > > > > 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 : > > > > internal error: libxenlight failed to create new domain > > > > 'debian.guest.osstest' > > > > > > This happened again in > > > http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html > > > > > > Is there anything we could tweak in osstest to produce more helpful > > > logging? > > > > Well we can find in var-log-libvirt-libvirtd.log.gz this: > > 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest' > > > > And for more information we need to look into the driver specific log, > > libxl logs in var-log-libvirt-libxl-libxl-driver.log: > > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out > > Thanks, all of that was mentioned earlier in the thread too, I was > looking for ways to get more info. > > > I'm seeing this error a lot on our OpenStack CI loop, I thought the error > > was due to the "host" been very busy, but if osstest is having the same > > issue, then there is probably something wrong with libxl+libvirt :(. > > Are you able to reproduce at will or is it like osstest and just a > sporadic failure? Like osstest. I haven't spend much time on it yet, I did not try to reproduce it yet. > I suppose the openstack CI loop doesn't capture anything more > interesting than osstest does? No, nothing else interesting. The next step would be to enable more debug output from libvirtd by playing with "log_level" and "log_filters" in /etc/libvirtd.conf, but I don't know which filter would be intersting. -- Anthony PERARD ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-14 11:07 ` Anthony PERARD @ 2015-05-14 21:27 ` Jim Fehlig 0 siblings, 0 replies; 15+ messages in thread From: Jim Fehlig @ 2015-05-14 21:27 UTC (permalink / raw) To: Anthony PERARD; +Cc: xen-devel, ian.jackson, Ian Campbell Anthony PERARD wrote: > On Thu, May 14, 2015 at 11:47:18AM +0100, Ian Campbell wrote: > >> I suppose the openstack CI loop doesn't capture anything more >> interesting than osstest does? >> > > No, nothing else interesting. The next step would be to enable more debug > output from libvirtd by playing with "log_level" and "log_filters" in > /etc/libvirtd.conf, but I don't know which filter would be intersting. > log_level is already set to DEBUG. And the xen tool logger used by the libxl driver is also set to XTL_DEBUG. I'm not aware of any more debug or logging to enable. Regards, Jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-14 10:47 ` Ian Campbell 2015-05-14 11:07 ` Anthony PERARD @ 2015-05-14 21:21 ` Jim Fehlig 2015-05-14 21:31 ` Jim Fehlig ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Jim Fehlig @ 2015-05-14 21:21 UTC (permalink / raw) To: Ian Campbell; +Cc: Anthony PERARD, xen-devel, ian.jackson Ian Campbell wrote: > On Wed, 2015-05-13 at 18:46 +0100, Anthony PERARD wrote: > >> On Wed, May 13, 2015 at 09:46:28AM +0100, Ian Campbell wrote: >> >>> On Mon, 2015-05-11 at 10:36 -0600, Jim Fehlig wrote: >>> [...] >>> >>>>> The qemu log is sadly empty so I've no clue why this timed out. >>>>> >>>>> >>>> I guess qemu didn't run at all... >>>> >>>> >>>>> Perhaps there is something in >>>>> http://logs.test-lab.xenproject.org/osstest/logs/55257/test-amd64-amd64-libvirt/merlot1---var-log-libvirt-libvirtd.log.gz >>>>> I can't make heads nor tail though. >>>>> >>>>> >>>> Nothing interesting. Only the unhelpful >>>> >>>> 2015-05-11 12:42:17.451+0000: 4280: error : libxlDomainStart:1032 : >>>> internal error: libxenlight failed to create new domain >>>> 'debian.guest.osstest' >>>> >>> This happened again in >>> http://logs.test-lab.xenproject.org/osstest/logs/55349/test-amd64-amd64-libvirt/info.html >>> >>> Is there anything we could tweak in osstest to produce more helpful >>> logging? >>> >> Well we can find in var-log-libvirt-libvirtd.log.gz this: >> 2015-05-12 17:39:35.180+0000: 4329: error : libxlDomainStart:1032 : internal error: libxenlight failed to create new domain 'debian.guest.osstest' >> >> And for more information we need to look into the driver specific log, >> libxl logs in var-log-libvirt-libxl-libxl-driver.log: >> libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out >> > > Thanks, all of that was mentioned earlier in the thread too, I was > looking for ways to get more info. > > >> I'm seeing this error a lot on our OpenStack CI loop, I thought the error >> was due to the "host" been very busy, but if osstest is having the same >> issue, then there is probably something wrong with libxl+libvirt :(. >> > > Are you able to reproduce at will or is it like osstest and just a > sporadic failure? > > I suppose the openstack CI loop doesn't capture anything more > interesting than osstest does? > > FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to > have two more instances of this (amd64 and i386) More cases of qemu not starting. I'm not sure how we can get more details about that. > but with no > interesting logs still and a different one on ARM: > > http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log: > 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16 > 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds > 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds > error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml > error: internal error: received hangup / error event on socket > In this case it seems libvirtd crashed. > In that case the the libxl-driver log ends with: > libxl: debug: libxl_dm.c:1495:libxl__spawn_local_dm: Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with arguments: > [...] > libxl: debug: libxl_event.c:600:libxl__ev_xswatch_register: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: register slotnum=3 > libxl: debug: libxl_create.c:1560:do_domain_create: ao 0xb2e044f0: inprogress: poller=0xb2e07590, flags=i > libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0xb2e07bcc wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state > > Which I don't think is complete, i.e. there should be more? Not sure if > this gives a hint for the x86 case too? > More hint that libvirtd crashed. Have there been any attempts to reproduce this outside of the test rig? Or capture a core dump? Regards, Jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-14 21:21 ` Jim Fehlig @ 2015-05-14 21:31 ` Jim Fehlig 2015-05-15 8:44 ` Ian Campbell 2015-05-15 10:39 ` Anthony PERARD 2 siblings, 0 replies; 15+ messages in thread From: Jim Fehlig @ 2015-05-14 21:31 UTC (permalink / raw) To: Ian Campbell; +Cc: Anthony PERARD, xen-devel, ian.jackson Jim Fehlig wrote: > More hint that libvirtd crashed. Have there been any attempts to > reproduce this outside of the test rig? Or capture a core dump? > FYI, I've unsuccessfully tried to reproduce this using config similar to debian.guest.osstest.cfg.xml. Regards, Jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-14 21:21 ` Jim Fehlig 2015-05-14 21:31 ` Jim Fehlig @ 2015-05-15 8:44 ` Ian Campbell 2015-05-15 10:39 ` Anthony PERARD 2 siblings, 0 replies; 15+ messages in thread From: Ian Campbell @ 2015-05-15 8:44 UTC (permalink / raw) To: Jim Fehlig; +Cc: Anthony PERARD, xen-devel, ian.jackson On Thu, 2015-05-14 at 15:21 -0600, Jim Fehlig wrote: > > FWIW http://logs.test-lab.xenproject.org/osstest/logs/55443/ seems to > > have two more instances of this (amd64 and i386) > > More cases of qemu not starting. I'm not sure how we can get more > details about that. FWIW I dug into this a bit more yesterday having discussed this with Ian and others a bit. We wondered if qemu had crashed, but the logs show a time out and libxl has code in the parent process which receives SIGCHLD and logs + errors out, so I think it probably isn't that, unless the monitoring code is buggy somehow (not out of the question, it's probably not exercised much). Also we expect that a crash would produce a segfault message on the kernel console, which didn't appear. We also considered where stderr was going. libxl redirects std{out,err} for the qemu to the qemu-dm-debian.guest.osstest.log file, which is captured and empty. There was some question about where libvirt's own stderr was going (/dev/null or perhaps the console) but it doesn't appear as if anything is going wrong in libvirt itself and as above we capture the std* for processes which we spawn ourselves. Lastly libvirtd is still running and is shown in the ps logs captured. > > > but with no > > interesting logs still and a different one on ARM: > > > > http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/11.ts-guest-start.log: > > 2015-05-13 09:23:32.193+0000: 16389: info : libvirt version: 1.2.16 > > 2015-05-13 09:23:32.193+0000: 16389: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds > > 2015-05-13 09:23:32.193+0000: 16390: warning : virKeepAliveTimerInternal:143 : No response from client 0xb7000c38 after 6 keepalive messages in 35 seconds > > error: Failed to create domain from /etc/xen/debian.guest.osstest.cfg.xml > > error: internal error: received hangup / error event on socket > > > > In this case it seems libvirtd crashed. http://logs.test-lab.xenproject.org/osstest/logs/55443/test-armhf-armhf-libvirt/arndale-lakeside-output-ps_wwwaxf_-eo_pid%2Ctty%2Cstat%2Ctime%2Cnice%2Cpsr%2Cpcpu%2Cpmem%2Cnwchan%2Cwchan%2325%2Cargs includes: 2301 ? DLl 00:00:00 0 0 0.0 1.6 ffffff fdget_pos /usr/local/sbin/libvirtd -d 16395 ? S 00:00:00 0 0 0.0 0.5 24b6dc wait \_ /usr/local/sbin/libvirtd -d 16396 ? Ssl 00:00:00 0 0 0.0 1.9 ffffff poll_schedule_timeout \_ /usr/local/lib/xen/bin/qemu-system-i386 -xen-domid 1 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-1,server,nowait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-1,server,nowait -mon chardev=libxenstat-cmd,mode=control -nodefaults -xen-attach -name debian.guest.osstest -vnc none -display none -nographic -machine xenpv -m 512 So I don't think it has crashed, it's even successfully spawned a qemu it seems. Comparing the libxl-driver.log here with the amd64 case: libxl: debug: libxl_event.c:537:watchfd_callback: watch w=0x7ff4d70595e0 wpath=/local/domain/0/device-model/1/state token=3/0: event epath=/local/domain/0/device-model/1/state [arm stops here, amd64 continues with the remainder] libxl: debug: libxl_aoutils.c:87:xswait_timeout_callback: domain 1 device model startup: xswait timeout (path=/local/domain/0/device-model/1/state) libxl: debug: libxl_event.c:638:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3 libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0: deregister unregistered libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d70595e0: deregister unregistered libxl: error: libxl_dm.c:1565:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3) libxl: error: libxl_create.c:1362:domcreate_devmodel_started: device model did not start: -3 libxl: debug: libxl_dm.c:1678:kill_device_model: Device Model signaled libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d702f3c0: deregister unregistered libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7ff4d7031290: deregister unregistered libxl: debug: libxl.c:1701:devices_destroy_cb: forked pid 18588 for destroy of domain 1 libxl: debug: libxl_event.c:1768:libxl__ao_complete: ao 0x7ff4d702ed60: complete, rc=-3 libxl: debug: libxl_event.c:1740:libxl__ao__destroy: ao 0x7ff4d702ed60: destroy I wonder if we are somehow loosing an event or getting the event loop screwed up. Perhaps in the amd64 case we are somehow losing the xenstore watch, in the armhf case we are losing some other fd which interferes with libvirt's own event loop? So I think we are looking at either a hang or an event processing SNAFU rather than a crash. BTW, in the above there is "Device Model signaled", which indicates that kill(pid, SIGHUP) returned 0 and not e.g. ESRCH (when it would log "Device Model already exited") or anything else (when it would log "failed to kill..."). So the qemu process was actually present. The host is doing nothing other than running this one test case, so it doesn't seem likely that we are really hitting the 30s qemu startup timeout. Ian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-14 21:21 ` Jim Fehlig 2015-05-14 21:31 ` Jim Fehlig 2015-05-15 8:44 ` Ian Campbell @ 2015-05-15 10:39 ` Anthony PERARD 2015-05-15 11:54 ` Ian Campbell 2 siblings, 1 reply; 15+ messages in thread From: Anthony PERARD @ 2015-05-15 10:39 UTC (permalink / raw) To: Jim Fehlig; +Cc: xen-devel, ian.jackson, Ian Campbell On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote: > More hint that libvirtd crashed. Have there been any attempts to > reproduce this outside of the test rig? Or capture a core dump? Here are two from the OpenStack CI loop: http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68 http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0 in logs/libvirt/libxl/libxl-driver.txt.gz, you will find: libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3) libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3 Weird, it's the same domain number for both logs :). Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz, which is the service that talk to libvirtd. It's running libvirt 1.2.14 with: f86ae40 libxl: Move job acquisition in libxlDomainStart to callers 894d2ff libxl: acquire a job when destroying a domain 6dfec1e libxl: drop virDomainObj lock when destroying a domain and xen 4.4.1 with: 9369988 libxl: event handling: Break out ao_work_outstanding f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding 4783c99 libxl: In domain death search, start search at first domid we want 188e9c5 libxl: Domain destroy: fork http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline No libvirtd carsh. -- Anthony PERARD ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-15 10:39 ` Anthony PERARD @ 2015-05-15 11:54 ` Ian Campbell 2015-05-15 15:33 ` Anthony PERARD 0 siblings, 1 reply; 15+ messages in thread From: Ian Campbell @ 2015-05-15 11:54 UTC (permalink / raw) To: Anthony PERARD; +Cc: Jim Fehlig, xen-devel, ian.jackson On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote: > On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote: > > More hint that libvirtd crashed. Have there been any attempts to > > reproduce this outside of the test rig? Or capture a core dump? > > Here are two from the OpenStack CI loop: > http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68 > http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0 > > in logs/libvirt/libxl/libxl-driver.txt.gz, you will find: > libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out > libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3) > libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3 > > Weird, it's the same domain number for both logs :). > > Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz, > which is the service that talk to libvirtd. > > It's running libvirt 1.2.14 with: > f86ae40 libxl: Move job acquisition in libxlDomainStart to callers > 894d2ff libxl: acquire a job when destroying a domain > 6dfec1e libxl: drop virDomainObj lock when destroying a domain > and xen 4.4.1 with: > 9369988 libxl: event handling: Break out ao_work_outstanding > f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding > 4783c99 libxl: In domain death search, start search at first domid we want > 188e9c5 libxl: Domain destroy: fork > http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline Interesting. We didn't used to see these issues, but there has been a rather large gap where we didn't get useful results due to upheaval from the colo move and there were other issues (e.g. the crashing issue) which make it hard to pinpoint a point in time where this didn't happen. Did you have a previous baseline which didn't exhibit these problems? Or did it exhibit enough other problems not to be usable? If we can find some plausible sounding baseline to try (i.e. commit id, not a commit id + patch queue) then I could try and run some adhoc tests to establish a baseline. Perhaps I should try xen.git#stable-4.5 and libvirt.git#1.2.14 in the first instance? Or I could pick a xen-unstable flight pass from, say, Easter-ish and try with that? This seems to be an intermittent bug, so it's not clear that the bisector is going to be all that useful. However we do do multiple domain starts now so perhaps the chances of sneaking past are reduced. Ian. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [libvirt test] 55257: regressions - FAIL 2015-05-15 11:54 ` Ian Campbell @ 2015-05-15 15:33 ` Anthony PERARD 0 siblings, 0 replies; 15+ messages in thread From: Anthony PERARD @ 2015-05-15 15:33 UTC (permalink / raw) To: Ian Campbell; +Cc: Jim Fehlig, xen-devel, ian.jackson On Fri, May 15, 2015 at 12:54:32PM +0100, Ian Campbell wrote: > On Fri, 2015-05-15 at 11:39 +0100, Anthony PERARD wrote: > > On Thu, May 14, 2015 at 03:21:41PM -0600, Jim Fehlig wrote: > > > More hint that libvirtd crashed. Have there been any attempts to > > > reproduce this outside of the test rig? Or capture a core dump? > > > > Here are two from the OpenStack CI loop: > > http://logs.openstack.xenproject.org/10/181110/5/check/dsvm-tempest-xen/6005c68 > > http://logs.openstack.xenproject.org/21/183221/2/check/dsvm-tempest-xen/56324b0 > > > > in logs/libvirt/libxl/libxl-driver.txt.gz, you will find: > > libxl: error: libxl_exec.c:396:spawn_timeout: domain 108 device model: startup timed out > > libxl: error: libxl_dm.c:1388:device_model_spawn_outcome: domain 108 device model: spawn failed (rc=-3) > > libxl: error: libxl_create.c:1186:domcreate_devmodel_started: device model did not start: -3 > > > > Weird, it's the same domain number for both logs :). > > > > Other usefull logs from openstack can be found in logs/screen-n-cpu.txt.gz, > > which is the service that talk to libvirtd. > > > > It's running libvirt 1.2.14 with: > > f86ae40 libxl: Move job acquisition in libxlDomainStart to callers > > 894d2ff libxl: acquire a job when destroying a domain > > 6dfec1e libxl: drop virDomainObj lock when destroying a domain > > and xen 4.4.1 with: > > 9369988 libxl: event handling: Break out ao_work_outstanding > > f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding > > 4783c99 libxl: In domain death search, start search at first domid we want > > 188e9c5 libxl: Domain destroy: fork > > http://wiki.xenproject.org/wiki/OpenStack_CI_Loop_for_Xen-Libvirt#Baseline > > Interesting. > > We didn't used to see these issues, but there has been a rather large > gap where we didn't get useful results due to upheaval from the colo > move and there were other issues (e.g. the crashing issue) which make it > hard to pinpoint a point in time where this didn't happen. > > Did you have a previous baseline which didn't exhibit these problems? Or > did it exhibit enough other problems not to be usable? This is the first baseline to be usefull with the CI loop, previous libvirt release had other issues. So in short, no. -- Anthony PERARD ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-05-15 15:33 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-05-11 12:46 [libvirt test] 55257: regressions - FAIL osstest service user 2015-05-11 13:22 ` Ian Campbell 2015-05-11 16:36 ` Jim Fehlig 2015-05-11 17:02 ` Ian Campbell 2015-05-13 8:46 ` Ian Campbell 2015-05-13 17:46 ` Anthony PERARD 2015-05-14 10:47 ` Ian Campbell 2015-05-14 11:07 ` Anthony PERARD 2015-05-14 21:27 ` Jim Fehlig 2015-05-14 21:21 ` Jim Fehlig 2015-05-14 21:31 ` Jim Fehlig 2015-05-15 8:44 ` Ian Campbell 2015-05-15 10:39 ` Anthony PERARD 2015-05-15 11:54 ` Ian Campbell 2015-05-15 15:33 ` Anthony PERARD
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.