From: Atom2 <ariel.atom2@web2web.at>
To: Doug Goldstein <cardoe@cardoe.com>, xen-devel@lists.xen.org
Subject: Re: HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2
Date: Mon, 16 Nov 2015 02:05:35 +0100 [thread overview]
Message-ID: <56492BDF.5030208@web2web.at> (raw)
In-Reply-To: <5648E727.6080204@cardoe.com>
[-- Attachment #1: Type: text/plain, Size: 9115 bytes --]
Am 15.11.15 um 21:12 schrieb Doug Goldstein:
> On 11/14/15 6:14 PM, Atom2 wrote:
>> Am 14.11.15 um 21:32 schrieb Andrew Cooper:
>>> On 14/11/2015 00:16, Atom2 wrote:
>>>> Am 13.11.15 um 11:09 schrieb Andrew Cooper:
>>>>> On 13/11/15 07:25, Jan Beulich wrote:
>>>>>>>>> On 13.11.15 at 00:00, <ariel.atom2@web2web.at> wrote:
>>>>>>> Am 12.11.15 um 17:43 schrieb Andrew Cooper:
>>>>>>>> On 12/11/15 14:29, Atom2 wrote:
>>>>>>>>> Hi Andrew,
>>>>>>>>> thanks for your reply. Answers are inline further down.
>>>>>>>>>
>>>>>>>>> Am 12.11.15 um 14:01 schrieb Andrew Cooper:
>>>>>>>>>> On 12/11/15 12:52, Jan Beulich wrote:
>>>>>>>>>>>>>> On 12.11.15 at 02:08, <ariel.atom2@web2web.at> wrote:
>>>>>>>>>>>> After the upgrade HVM domUs appear to no longer work - regardless
>>>>>>>>>>>> of the
>>>>>>>>>>>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 kernel); PV
>>>>>>>>>>>> domUs, however, work just fine as before on both dom0 kernels.
>>>>>>>>>>>>
>>>>>>>>>>>> xl dmesg shows the following information after the first crashed HVM
>>>>>>>>>>>> domU which is started as part of the machine booting up:
>>>>>>>>>>>> [...]
>>>>>>>>>>>> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest
>>>>>>>>>>>> state (0).
>>>>>>>>>>>> (XEN) ************* VMCS Area **************
>>>>>>>>>>>> (XEN) *** Guest State ***
>>>>>>>>>>>> (XEN) CR0: actual=0x0000000000000039, shadow=0x0000000000000011,
>>>>>>>>>>>> gh_mask=ffffffffffffffff
>>>>>>>>>>>> (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000,
>>>>>>>>>>>> gh_mask=ffffffffffffffff
>>>>>>>>>>>> (XEN) CR3: actual=0x0000000000800000, target_count=0
>>>>>>>>>>>> (XEN) target0=0000000000000000, target1=0000000000000000
>>>>>>>>>>>> (XEN) target2=0000000000000000, target3=0000000000000000
>>>>>>>>>>>> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc) RIP =
>>>>>>>>>>>> 0x0000000100000000 (0x0000000100000000)
>>>>>>>>>>> Other than RIP looking odd for a guest still in non-paged protected
>>>>>>>>>>> mode I can't seem to spot anything wrong with guest state.
>>>>>>>>>> odd? That will be the source of the failure.
>>>>>>>>>>
>>>>>>>>>> Out of long mode, the upper 32bit of %rip should all be zero, and it
>>>>>>>>>> should not be possible to set any of them.
>>>>>>>>>>
>>>>>>>>>> I suspect that the guest has exited for emulation, and there has been a
>>>>>>>>>> bad update to %rip. The alternative (which I hope is not the case) is
>>>>>>>>>> that there is a hardware errata which allows the guest to accidentally
>>>>>>>>>> get it self into this condition.
>>>>>>>>>>
>>>>>>>>>> Are you able to rerun with a debug build of the hypervisor?
>> [big snip]
>>>>>>>>> Now _without_ the debug USE flag, but with debug information in
>>>>>>>>> the binary (I used splitdebug), all is back to where the problem
>>>>>>>>> started off (i.e. the system boots without issues until such
>>>>>>>>> time it starts a HVM domU which then crashes; PV domUs are
>>>>>>>>> working). I have attached the latest "xl dmesg" output with the
>>>>>>>>> timing information included.
>>>>>>>>>
>>>> I hope any of this makes sense to you.
>>>>
>>>> Again many thanks and best regards
>>>>
>>> Right - it would appear that the USE flag is definitely not what you
>>> wanted, and causes bad compilation for Xen. The do_IRQ disassembly
>>> you sent is a the result of disassembling a whole block of zeroes.
>>> Sorry for leading you on a goose chase - the double faults will be the
>>> product of bad compilation, rather than anything to do with your
>>> specific problem.
>> Hi Andrew,
>> there's absolutely no need to appologize as it is me who asked for help
>> and you who generously stepped in and provided it. I really do
>> appreciate your help and it is for me, as the one seeking help, to
>> provide all the information you deem necessary and you ask for.
>>> However, the final log you sent (dmesg) is using a debug Xen, which is
>>> what I was attempting to get you to do originally.
>> Next time I know better how to arrive at a debug XEN. It's all about
>> learning.
>>> We still observe that the VM ends up in 32bit non-paged mode but with
>>> an RIP with bit 32 set, which is an invalid state to be in. However,
>>> there was nothing particularly interesting in the extra log information.
>>>
>>> Please can you rerun with "hvm_debug=0xc3f", which will cause far more
>>> logging to occur to the console while the HVM guest is running. That
>>> might show some hints.
>> I haven't done that yet - but please see my next paragraph. If you are
>> still interested in this, for whatever reason, I am clearly more than
>> happy to rerun with your suggested option and provide that information
>> as well.
>>> Also, the fact that this occurs just after starting SeaBIOS is
>>> interesting. As you have switched versions of Xen, you have also
>>> switched hvmloader, which contains the SeaBIOS binary embedded in it.
>>> Would you be able to compile both 4.5.1 and 4.5.2 and switch the
>>> hvmloader binaries in use. It would be very interesting to see
>>> whether the failure is caused by the hvmloader binary or the
>>> hypervisor. (With `xl`, you can use
>>> firmware_override="/full/path/to/firmware" to override the default
>>> hvmloader).
>> Your analysis was absolutely spot on. After re-thinking this for a
>> moment, I thought going down that route first would make a lot of sense
>> as PV guests still do work and one of the differences to HVM domUs is
>> that the former do _not_ require SeaBIOS. Looking at my log files of
>> installed packages confirmed an upgrade from SeaBIOS 1.7.5 to 1.8.2 in
>> the relevant timeframe which obviously had not made it to the hvmloader
>> of xen-4.5.1 as I did not re-compile xen after the upgrade of SeaBIOS.
>>
>> So I re-compiled xen-4.5.1 (obviously now using the installed SeaBIOS
>> 1.8.2) and the same error as with xen-4.5.2 popped up - and that seemed
>> to strongly indicate that there indeed might be an issue with SeaBIOS as
>> this probably was the only variable that had changed from the original
>> install of xen-4.5.1.
>>
>> My next step was to downgrade SeaBIOS to 1.7.5 and to re-compile
>> xen-4.5.1. Voila, the system was again up and running. While still
>> having SeaBIOS 1.7.5 installed, I also re-compiled xen-4.5.2 and ... you
>> probably guessed it ... the problem was gone: The system boots up with
>> no issues and everything is fine again.
>>
>> So in a nutshell: There seems to be a problem with SeaBIOS 1.8.2
>> preventing HVM doamins from successfully starting up. I don't know what
>> this is triggered from, if this is specific to my hardware or whether
>> something else in my environment is to blame.
>>
>> In any case, I am again more than happy to provide data / run a few
>> tests should you wish to get to the grounds of this.
>>
>> I do owe you a beer (or any other drink) should you ever be at my
>> location (i.e. Vienna, Austria).
>>
>> Many thanks again for your analysis and your first class support. Xen
>> and their people absolutely rock!
>>
>> Atom2
> I'm a little late to the thread but can you send me (you can do it
> off-list if you'd like) the USE flags you used for xen, xen-tools and
> seabios? Also emerge --info. You can kill two birds with one stone by
> using emerge --info xen.
Hi Doug,
here you go:
USE flags:
app-emulation/xen-4.5.2-r1::gentoo USE="-custom-cflags -debug -efi
-flask -xsm"
app-emulation/xen-tools-4.5.2::gentoo USE="hvm pam pygrub python qemu
screen system-seabios -api -custom-cflags -debug -doc -flask (-ocaml)
-ovmf -static-libs -system-qemu" PYTHON_TARGETS="python2_7"
sys-firmware/seabios-1.7.5::gentoo USE="binary"
emerge --info: Please see the attached file
> I'm not too familiar with the xen ebuilds but I was pretty sure that
> xen-tools is what builds hvmloader and it downloads a copy of SeaBIOS
> and builds it so that it remains consistent. But obviously your
> experience shows otherwise.
You are right, it's xen-tools that builds hvmloader. If I remember
correctly, the "system-seabios" USE flag (for xen-tools) specifies
whether sys-firmware/seabios is used and the latter downloads SeaBIOS in
it's binary form provided its "binary" USE flag is set. At least that's
my understanding.
> I'm looking at some ideas to improve SeaBIOS packaging on Gentoo and
> your info would be helpful.
Great. Whatever makes gentoo and xen stronger will be awesome. What
immediately springs to mind is to create a separate hvmloader package
and slot that (that's just an idea and probably not fully thought
through, but ss far as I understood Andrew, it would then be possible to
specify the specific firmware version [i.e. hvmloader] to use on xl's
command line by using firmware_override="full/path/to/firmware").
I also found out that an upgrade to sys-firmware/seabios obviously does
not trigger an automatic re-emerge of xen-tools and thus hvmloader.
Shouldn't this also happen automatically as xen-tools depends on seabios?
Thanks and best regards Atom2
P.S. If you prefer to take this off-list, just reply to my mail address.
[-- Attachment #2: info --]
[-- Type: text/plain, Size: 3773 bytes --]
Portage 2.2.20.1 (python 2.7.10-final-0, hardened/linux/amd64, gcc-4.9.3, glibc-2.21-r1, 4.1.7-hardened-r1 x86_64)
=================================================================
System uname: Linux-4.1.7-hardened-r1-x86_64-Intel-R-_Xeon-R-_CPU_E31260L_@_2.40GHz-with-gentoo-2.2
KiB Mem: 4032716 total, 3678784 free
KiB Swap: 16777148 total, 16777148 free
Timestamp of repository gentoo: Sun, 15 Nov 2015 00:45:01 +0000
sh bash 4.3_p39
ld GNU ld (Gentoo 2.25.1 p1.1) 2.25.1
app-shells/bash: 4.3_p39::gentoo
dev-lang/perl: 5.20.2::gentoo
dev-lang/python: 2.7.10::gentoo, 3.4.3::gentoo
dev-util/cmake: 3.3.1-r1::gentoo
dev-util/pkgconfig: 0.28-r2::gentoo
sys-apps/baselayout: 2.2::gentoo
sys-apps/openrc: 0.17::gentoo
sys-apps/sandbox: 2.6-r1::gentoo
sys-devel/autoconf: 2.69::gentoo
sys-devel/automake: 1.13.4::gentoo, 1.14.1::gentoo, 1.15::gentoo
sys-devel/binutils: 2.25.1-r1::gentoo
sys-devel/gcc: 4.8.5::gentoo, 4.9.3::gentoo
sys-devel/gcc-config: 1.7.3::gentoo
sys-devel/libtool: 2.4.6::gentoo
sys-devel/make: 4.1-r1::gentoo
sys-kernel/linux-headers: 3.18::gentoo (virtual/os-headers)
sys-libs/glibc: 2.21-r1::gentoo
Repositories:
gentoo
location: /usr/portage
sync-type: rsync
sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage/
priority: -1000
x-portage
location: /usr/local/portage
masters: gentoo
priority: 0
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--quiet-build=y --buildpkg-exclude sys-kernel/hardened-sources"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs buildpkg config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://gd.tuwien.ac.at/opsys/linux/gentoo/ ftp://gd.tuwien.ac.at/opsys/linux/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j9"
PKGDIR="/usr/portage/packages"
PORTAGE_COMPRESS=""
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="--quiet --progress"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
USE="acl aes amd64 avx bash-completion berkdb bzip2 cli cracklib crypt cxx gdbm hardened iconv justify lm_sensors mmx mmxext modules multilib ncurses nls nptl openmp pam pax_kernel pcre pie popcnt readline seccomp session sse sse2 sse3 sse4.1 sse4_1 ssl ssp ssse3 tcpd unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" CPU_FLAGS_X86="aes avx mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4.1 ssse3" ELIBC="glibc" KERNEL="linux" LINGUAS="en" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7" RUBY_TARGETS="ruby20" USERLAND="GNU" VIDEO_CARDS="intel i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
USE_PYTHON="2.7"
Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS_FLAGS
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-11-16 1:05 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-12 1:08 HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2 Atom2
2015-11-12 12:52 ` Jan Beulich
2015-11-12 13:01 ` Andrew Cooper
2015-11-12 14:29 ` Atom2
2015-11-12 15:32 ` Jan Beulich
2015-11-12 16:43 ` Andrew Cooper
2015-11-12 23:00 ` Atom2
2015-11-13 7:25 ` Jan Beulich
2015-11-13 10:09 ` Andrew Cooper
2015-11-14 0:16 ` Atom2
2015-11-14 20:32 ` Andrew Cooper
2015-11-15 0:14 ` Atom2
2015-11-15 15:12 ` Andrew Cooper
2015-11-16 0:39 ` Atom2
2015-11-16 10:02 ` Andrew Cooper
2015-11-15 20:12 ` Doug Goldstein
2015-11-16 1:05 ` Atom2 [this message]
2015-11-16 15:31 ` Konrad Rzeszutek Wilk
2015-11-16 19:16 ` Atom2
2015-11-16 19:25 ` Konrad Rzeszutek Wilk
2015-11-16 19:39 ` Doug Goldstein
2015-11-16 19:47 ` Konrad Rzeszutek Wilk
2015-11-16 19:45 ` Atom2
2015-11-16 23:01 ` Andrew Cooper
2015-11-16 23:10 ` Atom2
2015-11-18 22:51 ` Atom2
2015-11-18 23:17 ` Andrew Cooper
2015-11-19 0:31 ` Atom2
2015-11-19 1:06 ` Andrew Cooper
2015-11-19 20:02 ` Atom2
2015-11-19 23:53 ` Andrew Cooper
2015-11-24 11:53 ` Atom2
2015-11-19 10:24 ` Jan Beulich
2015-11-19 10:38 ` Andrew Cooper
2015-11-19 19:51 ` Atom2
2015-11-20 7:57 ` Jan Beulich
2015-11-24 10:32 ` Atom2
2015-11-24 10:43 ` Jan Beulich
2015-11-27 22:51 ` Atom2
2015-11-30 9:04 ` Jan Beulich
2015-11-16 19:47 ` Doug Goldstein
2015-11-16 20:14 ` Atom2
2015-11-12 14:12 ` Atom2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56492BDF.5030208@web2web.at \
--to=ariel.atom2@web2web.at \
--cc=cardoe@cardoe.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.