From: Wei Liu <wei.liu2@citrix.com>
To: Jim Fehlig <jfehlig@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
Wei Liu <wei.liu2@citrix.com>,
osstest service owner <osstest-admin@xenproject.org>,
Jan Beulich <JBeulich@suse.com>
Subject: Re: [xen-4.9-testing test] 126201: regressions - FAIL
Date: Fri, 24 Aug 2018 09:58:02 +0100 [thread overview]
Message-ID: <20180824085802.5gfltbaywpfffytw@citrix.com> (raw)
In-Reply-To: <5ec84362-88e6-c9d8-6f17-578616ab1a76@suse.com>
On Wed, Aug 22, 2018 at 04:52:27PM -0600, Jim Fehlig wrote:
> On 08/21/2018 05:14 AM, Jan Beulich wrote:
> > > > > On 21.08.18 at 03:11, <osstest-admin@xenproject.org> wrote:
> > > flight 126201 xen-4.9-testing real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/126201/
> > >
> > > Regressions :-(
> > >
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > > test-amd64-amd64-libvirt-pair 22 guest-migrate/src_host/dst_host fail REGR. vs. 124328
> >
> > Something needs to be done about this, as this continued failure is
> > blocking the 4.9.3 release. I did mail about this on Aug 2nd already
> > for flight 125710, I've got back from Wei:
> >
> > > This is libvirtd's error message.
> > >
> > > The remote host can't obtain the state change log due to it is already
> > > held by another task/thread. It could be a libvirt / libxl bug.
> > >
> > > 2018-08-01 16:12:13.433+0000: 3491: warning : libxlDomainObjBeginJob:151 :
> > > Cannot start job (modify) for domain debian.guest.osstest; current job is (modify) owned by (24975)
>
> I took a closer look at the logs and it appears the finish phase of
> migration fails to acquire the domain job lock since it is already held by
> the perform phase. In the perform phase, after the vm has been transferred
> to the dst, the qemu process associated with the vm is started. For whatever
> reason that takes a long time on this host:
>
> 2018-08-19 17:05:19.182+0000: libxl: libxl_dm.c:2235:libxl__spawn_local_dm:
> Domain 1:Spawning device-model /usr/local/lib/xen/bin/qemu-system-i386 with
> arguments: ...
> 2018-08-19 17:05:19.188+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> domain 1 device model: spawn watch p=(null)
This is a spurious event after the watch has been set up.
> ...
> 2018-08-19 17:05:51.529+0000: libxl: libxl_event.c:573:watchfd_callback:
> watch w=0x7f84a0047ee8 wpath=/local/domain/0/device-model/1/state token=2/1:
> event epath=/local/domain/0/device-model/1/state
> 2018-08-19 17:05:51.529+0000: libxl: libxl_exec.c:398:spawn_watch_event:
> domain 1 device model: spawn watch p=running
So it has taken 32s for QEMU to write "running" in xenstore. This,
however, is still within the timeout limit set by libxl (60s).
>
> In the meantime we move to the finish phase and timeout waiting for the
> above perform phase to complete
>
> 2018-08-19 17:05:19.096+0000: 3492: debug : virThreadJobSet:96 : Thread 3492
> (virNetServerHandleJob) is now running job
> remoteDispatchDomainMigrateFinish3Params
> ...
> 2018-08-19 17:05:49.253+0000: 3492: warning : libxlDomainObjBeginJob:151 :
> Cannot start job (modify) for domain debian.guest.osstest; current job is
> (modify) owned by (24982)
> 2018-08-19 17:05:49.253+0000: 3492: error : libxlDomainObjBeginJob:155 :
> Timed out during operation: cannot acquire state change lock
>
> What could be causing the long startup time of qemu on these hosts? Does
> dom0 have enough cpu/memory? As you noticed, the libvirt commit used for
> this test has not changed in a long time, well before the failures appeared.
> Perhaps a subtle change in libxl is exposing the bug?
There have only been two changes to libxl in the range of changesets
being tested.
c257e35a libxl: qemu_disk_scsi_drive_string: Break out common parts of disk config
5d92007c libxl: restore passing "readonly=" to qemu for SCSI disks
They wouldn't change how libxl interact with libvirt. QEMU tag didn't
change.
Wei.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-08-24 8:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-21 1:11 [xen-4.9-testing test] 126201: regressions - FAIL osstest service owner
2018-08-21 11:14 ` Jan Beulich
2018-08-21 11:44 ` Roger Pau Monné
2018-08-21 11:58 ` Jan Beulich
[not found] ` <5B7BF42E02000078001E06A7@suse.com>
2018-08-22 22:52 ` Jim Fehlig
2018-08-24 8:58 ` Wei Liu [this message]
2018-08-27 7:50 ` Jan Beulich
2018-08-30 10:57 ` Wei Liu
2018-09-05 21:37 ` Jim Fehlig
2018-09-11 22:18 ` Jim Fehlig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180824085802.5gfltbaywpfffytw@citrix.com \
--to=wei.liu2@citrix.com \
--cc=JBeulich@suse.com \
--cc=jfehlig@suse.com \
--cc=osstest-admin@xenproject.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).