From: Jim Fehlig <jfehlig@suse.com>
To: Jan Beulich <JBeulich@suse.com>,
osstest service owner <osstest-admin@xenproject.org>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [xen-4.9-testing test] 126201: regressions - FAIL
Date: Wed, 22 Aug 2018 16:52:27 -0600 [thread overview]
Message-ID: <5ec84362-88e6-c9d8-6f17-578616ab1a76@suse.com> (raw)
In-Reply-To: <5B7BF42E02000078001E06A7@suse.com>
On 08/21/2018 05:14 AM, Jan Beulich wrote:
>>>> On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
>> flight 126201 xen-4.9-testing real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/126201/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
>
> Something needs to be done about this, as this continued failure is
> blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> for flight 125710, I've got back from Wei:
>
>> This is libvirtd's error message.
>>
>> The remote host can't obtain the state change log due to it is already
>> held by another task/thread. It could be a libvirt / libxl bug.
>>
>> 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
>> Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
I took a closer look at the logs and it appears the finish phase of migration
fails to acquire the domain job lock since it is already held by the perform
phase. In the perform phase, after the vm has been transferred to the dst, the
qemu process associated with the vm is started. For whatever reason that takes a
long time on this host:
2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
arguments: ...
2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event: domain
1 device model: spawn watch p=(null)
...
2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback: watch
w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1: event
epath=/local/domain/0/device-model/1/state
2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event: domain
1 device model: spawn watch p=running
In the meantime we move to the finish phase and timeout waiting for the above
perform phase to complete
2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492
(virNetServerHandleJob) is now running job remoteDispatchDomainMigrateFinish3Params
...
2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 :
Cannot start job (modify) for domain debian.guest.osstest; current job is
(modify) owned by (24982)
2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 : Timed
out during operation: cannot acquire state change lock
What could be causing the long startup time of qemu on these hosts? Does dom0
have enough cpu/memory? As you noticed, the libvirt commit used for this test
has not changed in a long time, well before the failures appeared. Perhaps a
subtle change in libxl is exposing the bug?
Regardless, I'm happy to have looked at the issue since I think libvirt can be
improved to cope with the problem. The thread running in the dst receiving the
vm via libxl_domain_create_restore() can be created with joinable flag, then
joined in the finish phase before attempting to acquire the job lock. I'll look
into making such an improvement in libvirt's libxl driver.
Regards,
Jim
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-08-22 22:52 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-21 1:11 [xen-4.9-testing test] 126201: regressions - FAIL osstest service owner
2018-08-21 11:14 ` Jan Beulich
2018-08-21 11:44 ` Roger Pau Monné
2018-08-21 11:58 ` Jan Beulich
[not found] ` <5B7BF42E02000078001E06A7@suse.com>
2018-08-22 22:52 ` Jim Fehlig [this message]
2018-08-24 8:58 ` Wei Liu
2018-08-27 7:50 ` Jan Beulich
2018-08-30 10:57 ` Wei Liu
2018-09-05 21:37 ` Jim Fehlig
2018-09-11 22:18 ` Jim Fehlig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ec84362-88e6-c9d8-6f17-578616ab1a76@suse.com \
--to=jfehlig@suse.com \
--cc=JBeulich@suse.com \
--cc=osstest-admin@xenproject.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).