From: Atom2 <ariel.atom2@web2web.at>
To: Doug Goldstein <cardoe@cardoe.com>, xen-devel@lists.xen.org
Subject: Re: HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2
Date: Mon, 16 Nov 2015 02:05:35 +0100 [thread overview]
Message-ID: <56492BDF.5030208@web2web.at> (raw)
In-Reply-To: <5648E727.6080204@cardoe.com>
[-- Attachment #1: Type: text/plain, Size: 9115 bytes --]
Am 15.11.15 um 21:12 schrieb Doug Goldstein:
> On 11/14/15 6:14 PM, Atom2 wrote:
>> Am 14.11.15 um 21:32 schrieb Andrew Cooper:
>>> On 14/11/2015 00:16, Atom2 wrote:
>>>> Am 13.11.15 um 11:09 schrieb Andrew Cooper:
>>>>> On 13/11/15 07:25, Jan Beulich wrote:
>>>>>>>>> On 13.11.15 at 00:00, <ariel.atom2@web2web.at> wrote:
>>>>>>> Am 12.11.15 um 17:43 schrieb Andrew Cooper:
>>>>>>>> On 12/11/15 14:29, Atom2 wrote:
>>>>>>>>> Hi Andrew,
>>>>>>>>> thanks for your reply. Answers are inline further down.
>>>>>>>>>
>>>>>>>>> Am 12.11.15 um 14:01 schrieb Andrew Cooper:
>>>>>>>>>> On 12/11/15 12:52, Jan Beulich wrote:
>>>>>>>>>>>>>> On 12.11.15 at 02:08, <ariel.atom2@web2web.at> wrote:
>>>>>>>>>>>> After the upgrade HVM domUs appear to no longer work - regardless
>>>>>>>>>>>> of the
>>>>>>>>>>>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 kernel); PV
>>>>>>>>>>>> domUs, however, work just fine as before on both dom0 kernels.
>>>>>>>>>>>>
>>>>>>>>>>>> xl dmesg shows the following information after the first crashed HVM
>>>>>>>>>>>> domU which is started as part of the machine booting up:
>>>>>>>>>>>> [...]
>>>>>>>>>>>> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest
>>>>>>>>>>>> state (0).
>>>>>>>>>>>> (XEN) ************* VMCS Area **************
>>>>>>>>>>>> (XEN) *** Guest State ***
>>>>>>>>>>>> (XEN) CR0: actual=0x0000000000000039, shadow=0x0000000000000011,
>>>>>>>>>>>> gh_mask=ffffffffffffffff
>>>>>>>>>>>> (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000,
>>>>>>>>>>>> gh_mask=ffffffffffffffff
>>>>>>>>>>>> (XEN) CR3: actual=0x0000000000800000, target_count=0
>>>>>>>>>>>> (XEN) target0=0000000000000000, target1=0000000000000000
>>>>>>>>>>>> (XEN) target2=0000000000000000, target3=0000000000000000
>>>>>>>>>>>> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc) RIP =
>>>>>>>>>>>> 0x0000000100000000 (0x0000000100000000)
>>>>>>>>>>> Other than RIP looking odd for a guest still in non-paged protected
>>>>>>>>>>> mode I can't seem to spot anything wrong with guest state.
>>>>>>>>>> odd? That will be the source of the failure.
>>>>>>>>>>
>>>>>>>>>> Out of long mode, the upper 32bit of %rip should all be zero, and it
>>>>>>>>>> should not be possible to set any of them.
>>>>>>>>>>
>>>>>>>>>> I suspect that the guest has exited for emulation, and there has been a
>>>>>>>>>> bad update to %rip. The alternative (which I hope is not the case) is
>>>>>>>>>> that there is a hardware errata which allows the guest to accidentally
>>>>>>>>>> get it self into this condition.
>>>>>>>>>>
>>>>>>>>>> Are you able to rerun with a debug build of the hypervisor?
>> [big snip]
>>>>>>>>> Now _without_ the debug USE flag, but with debug information in
>>>>>>>>> the binary (I used splitdebug), all is back to where the problem
>>>>>>>>> started off (i.e. the system boots without issues until such
>>>>>>>>> time it starts a HVM domU which then crashes; PV domUs are
>>>>>>>>> working). I have attached the latest "xl dmesg" output with the
>>>>>>>>> timing information included.
>>>>>>>>>
>>>> I hope any of this makes sense to you.
>>>>
>>>> Again many thanks and best regards
>>>>
>>> Right - it would appear that the USE flag is definitely not what you
>>> wanted, and causes bad compilation for Xen. The do_IRQ disassembly
>>> you sent is a the result of disassembling a whole block of zeroes.
>>> Sorry for leading you on a goose chase - the double faults will be the
>>> product of bad compilation, rather than anything to do with your
>>> specific problem.
>> Hi Andrew,
>> there's absolutely no need to appologize as it is me who asked for help
>> and you who generously stepped in and provided it. I really do
>> appreciate your help and it is for me, as the one seeking help, to
>> provide all the information you deem necessary and you ask for.
>>> However, the final log you sent (dmesg) is using a debug Xen, which is
>>> what I was attempting to get you to do originally.
>> Next time I know better how to arrive at a debug XEN. It's all about
>> learning.
>>> We still observe that the VM ends up in 32bit non-paged mode but with
>>> an RIP with bit 32 set, which is an invalid state to be in. However,
>>> there was nothing particularly interesting in the extra log information.
>>>
>>> Please can you rerun with "hvm_debug=0xc3f", which will cause far more
>>> logging to occur to the console while the HVM guest is running. That
>>> might show some hints.
>> I haven't done that yet - but please see my next paragraph. If you are
>> still interested in this, for whatever reason, I am clearly more than
>> happy to rerun with your suggested option and provide that information
>> as well.
>>> Also, the fact that this occurs just after starting SeaBIOS is
>>> interesting. As you have switched versions of Xen, you have also
>>> switched hvmloader, which contains the SeaBIOS binary embedded in it.
>>> Would you be able to compile both 4.5.1 and 4.5.2 and switch the
>>> hvmloader binaries in use. It would be very interesting to see
>>> whether the failure is caused by the hvmloader binary or the
>>> hypervisor. (With `xl`, you can use
>>> firmware_override="/full/path/to/firmware" to override the default
>>> hvmloader).
>> Your analysis was absolutely spot on. After re-thinking this for a
>> moment, I thought going down that route first would make a lot of sense
>> as PV guests still do work and one of the differences to HVM domUs is
>> that the former do _not_ require SeaBIOS. Looking at my log files of
>> installed packages confirmed an upgrade from SeaBIOS 1.7.5 to 1.8.2 in
>> the relevant timeframe which obviously had not made it to the hvmloader
>> of xen-4.5.1 as I did not re-compile xen after the upgrade of SeaBIOS.
>>
>> So I re-compiled xen-4.5.1 (obviously now using the installed SeaBIOS
>> 1.8.2) and the same error as with xen-4.5.2 popped up - and that seemed
>> to strongly indicate that there indeed might be an issue with SeaBIOS as
>> this probably was the only variable that had changed from the original
>> install of xen-4.5.1.
>>
>> My next step was to downgrade SeaBIOS to 1.7.5 and to re-compile
>> xen-4.5.1. Voila, the system was again up and running. While still
>> having SeaBIOS 1.7.5 installed, I also re-compiled xen-4.5.2 and ... you
>> probably guessed it ... the problem was gone: The system boots up with
>> no issues and everything is fine again.
>>
>> So in a nutshell: There seems to be a problem with SeaBIOS 1.8.2
>> preventing HVM doamins from successfully starting up. I don't know what
>> this is triggered from, if this is specific to my hardware or whether
>> something else in my environment is to blame.
>>
>> In any case, I am again more than happy to provide data / run a few
>> tests should you wish to get to the grounds of this.
>>
>> I do owe you a beer (or any other drink) should you ever be at my
>> location (i.e. Vienna, Austria).
>>
>> Many thanks again for your analysis and your first class support. Xen
>> and their people absolutely rock!
>>
>> Atom2
> I'm a little late to the thread but can you send me (you can do it
> off-list if you'd like) the USE flags you used for xen, xen-tools and
> seabios? Also emerge --info. You can kill two birds with one stone by
> using emerge --info xen.
Hi Doug,
here you go:
USE flags:
app-emulation/xen-4.5.2-r1::gentoo USE="-custom-cflags -debug -efi
-flask -xsm"
app-emulation/xen-tools-4.5.2::gentoo USE="hvm pam pygrub python qemu
screen system-seabios -api -custom-cflags -debug -doc -flask (-ocaml)
-ovmf -static-libs -system-qemu" PYTHON_TARGETS="python2_7"
sys-firmware/seabios-1.7.5::gentoo USE="binary"
emerge --info: Please see the attached file
> I'm not too familiar with the xen ebuilds but I was pretty sure that
> xen-tools is what builds hvmloader and it downloads a copy of SeaBIOS
> and builds it so that it remains consistent. But obviously your
> experience shows otherwise.
You are right, it's xen-tools that builds hvmloader. If I remember
correctly, the "system-seabios" USE flag (for xen-tools) specifies
whether sys-firmware/seabios is used and the latter downloads SeaBIOS in
it's binary form provided its "binary" USE flag is set. At least that's
my understanding.
> I'm looking at some ideas to improve SeaBIOS packaging on Gentoo and
> your info would be helpful.
Great. Whatever makes gentoo and xen stronger will be awesome. What
immediately springs to mind is to create a separate hvmloader package
and slot that (that's just an idea and probably not fully thought
through, but ss far as I understood Andrew, it would then be possible to
specify the specific firmware version [i.e. hvmloader] to use on xl's
command line by using firmware_override="full/path/to/firmware").
I also found out that an upgrade to sys-firmware/seabios obviously does
not trigger an automatic re-emerge of xen-tools and thus hvmloader.
Shouldn't this also happen automatically as xen-tools depends on seabios?
Thanks and best regards Atom2
P.S. If you prefer to take this off-list, just reply to my mail address.
[-- Attachment #2: info --]
[-- Type: text/plain, Size: 3773 bytes --]
Portage 2.2.20.1 (python 2.7.10-final-0, hardened/linux/amd64, gcc-4.9.3, glibc-2.21-r1, 4.1.7-hardened-r1 x86_64)
=================================================================
System uname: Linux-4.1.7-hardened-r1-x86_64-Intel-R-_Xeon-R-_CPU_E31260L_@_2.40GHz-with-gentoo-2.2
KiB Mem: 4032716 total, 3678784 free
KiB Swap: 16777148 total, 16777148 free
Timestamp of repository gentoo: Sun, 15 Nov 2015 00:45:01 +0000
sh bash 4.3_p39
ld GNU ld (Gentoo 2.25.1 p1.1) 2.25.1
app-shells/bash: 4.3_p39::gentoo
dev-lang/perl: 5.20.2::gentoo
dev-lang/python: 2.7.10::gentoo, 3.4.3::gentoo
dev-util/cmake: 3.3.1-r1::gentoo
dev-util/pkgconfig: 0.28-r2::gentoo
sys-apps/baselayout: 2.2::gentoo
sys-apps/openrc: 0.17::gentoo
sys-apps/sandbox: 2.6-r1::gentoo
sys-devel/autoconf: 2.69::gentoo
sys-devel/automake: 1.13.4::gentoo, 1.14.1::gentoo, 1.15::gentoo
sys-devel/binutils: 2.25.1-r1::gentoo
sys-devel/gcc: 4.8.5::gentoo, 4.9.3::gentoo
sys-devel/gcc-config: 1.7.3::gentoo
sys-devel/libtool: 2.4.6::gentoo
sys-devel/make: 4.1-r1::gentoo
sys-kernel/linux-headers: 3.18::gentoo (virtual/os-headers)
sys-libs/glibc: 2.21-r1::gentoo
Repositories:
gentoo
location: /usr/portage
sync-type: rsync
sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage/
priority: -1000
x-portage
location: /usr/local/portage
masters: gentoo
priority: 0
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--quiet-build=y --buildpkg-exclude sys-kernel/hardened-sources"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs buildpkg config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://gd.tuwien.ac.at/opsys/linux/gentoo/ ftp://gd.tuwien.ac.at/opsys/linux/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j9"
PKGDIR="/usr/portage/packages"
PORTAGE_COMPRESS=""
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="--quiet --progress"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
USE="acl aes amd64 avx bash-completion berkdb bzip2 cli cracklib crypt cxx gdbm hardened iconv justify lm_sensors mmx mmxext modules multilib ncurses nls nptl openmp pam pax_kernel pcre pie popcnt readline seccomp session sse sse2 sse3 sse4.1 sse4_1 ssl ssp ssse3 tcpd unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" CPU_FLAGS_X86="aes avx mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4.1 ssse3" ELIBC="glibc" KERNEL="linux" LINGUAS="en" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7" RUBY_TARGETS="ruby20" USERLAND="GNU" VIDEO_CARDS="intel i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
USE_PYTHON="2.7"
Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS_FLAGS
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-11-16 1:05 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-12 1:08 HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2 Atom2
2015-11-12 12:52 ` Jan Beulich
2015-11-12 13:01 ` Andrew Cooper
2015-11-12 14:29 ` Atom2
2015-11-12 15:32 ` Jan Beulich
2015-11-12 16:43 ` Andrew Cooper
2015-11-12 23:00 ` Atom2
2015-11-13 7:25 ` Jan Beulich
2015-11-13 10:09 ` Andrew Cooper
2015-11-14 0:16 ` Atom2
2015-11-14 20:32 ` Andrew Cooper
2015-11-15 0:14 ` Atom2
2015-11-15 15:12 ` Andrew Cooper
2015-11-16 0:39 ` Atom2
2015-11-16 10:02 ` Andrew Cooper
2015-11-15 20:12 ` Doug Goldstein
2015-11-16 1:05 ` Atom2 [this message]
2015-11-16 15:31 ` Konrad Rzeszutek Wilk
2015-11-16 19:16 ` Atom2
2015-11-16 19:25 ` Konrad Rzeszutek Wilk
2015-11-16 19:39 ` Doug Goldstein
2015-11-16 19:47 ` Konrad Rzeszutek Wilk
2015-11-16 19:45 ` Atom2
2015-11-16 23:01 ` Andrew Cooper
2015-11-16 23:10 ` Atom2
2015-11-18 22:51 ` Atom2
2015-11-18 23:17 ` Andrew Cooper
2015-11-19 0:31 ` Atom2
2015-11-19 1:06 ` Andrew Cooper
2015-11-19 20:02 ` Atom2
2015-11-19 23:53 ` Andrew Cooper
2015-11-24 11:53 ` Atom2
2015-11-19 10:24 ` Jan Beulich
2015-11-19 10:38 ` Andrew Cooper
2015-11-19 19:51 ` Atom2
2015-11-20 7:57 ` Jan Beulich
2015-11-24 10:32 ` Atom2
2015-11-24 10:43 ` Jan Beulich
2015-11-27 22:51 ` Atom2
2015-11-30 9:04 ` Jan Beulich
2015-11-16 19:47 ` Doug Goldstein
2015-11-16 20:14 ` Atom2
2015-11-12 14:12 ` Atom2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56492BDF.5030208@web2web.at \
--to=ariel.atom2@web2web.at \
--cc=cardoe@cardoe.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).