From mboxrd@z Thu Jan 1 00:00:00 1970 From: Doug Goldstein Subject: Re: HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2 Date: Mon, 16 Nov 2015 13:47:33 -0600 Message-ID: <564A32D5.6070704@cardoe.com> References: <5643E68C.8090406@web2web.at> <564499B002000078000B43EE@prv-mh.provo.novell.com> <56448D9B.4090007@citrix.com> <5644A248.1060505@web2web.at> <5644C1CD.3020202@citrix.com> <56451A2B.9090706@web2web.at> <56459E5F02000078000B4944@prv-mh.provo.novell.com> <5645B6BC.6030603@citrix.com> <56467D44.5040205@web2web.at> <56479A6B.6080102@citrix.com> <5647CE57.50209@web2web.at> <5648E727.6080204@cardoe.com> <56492BDF.5030208@web2web.at> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3151212031025705598==" Return-path: In-Reply-To: <56492BDF.5030208@web2web.at> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Atom2 , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============3151212031025705598== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="6JawJUp4GWuefQOg81IIvxscFCxSX4SwE" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --6JawJUp4GWuefQOg81IIvxscFCxSX4SwE Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/15/15 7:05 PM, Atom2 wrote: > Am 15.11.15 um 21:12 schrieb Doug Goldstein: >> On 11/14/15 6:14 PM, Atom2 wrote: >>> Am 14.11.15 um 21:32 schrieb Andrew Cooper: >>>> On 14/11/2015 00:16, Atom2 wrote: >>>>> Am 13.11.15 um 11:09 schrieb Andrew Cooper: >>>>>> On 13/11/15 07:25, Jan Beulich wrote: >>>>>>>>>> On 13.11.15 at 00:00, wrote: >>>>>>>> Am 12.11.15 um 17:43 schrieb Andrew Cooper: >>>>>>>>> On 12/11/15 14:29, Atom2 wrote: >>>>>>>>>> Hi Andrew, >>>>>>>>>> thanks for your reply. Answers are inline further down. >>>>>>>>>> >>>>>>>>>> Am 12.11.15 um 14:01 schrieb Andrew Cooper: >>>>>>>>>>> On 12/11/15 12:52, Jan Beulich wrote: >>>>>>>>>>>>>>> On 12.11.15 at 02:08, wrote: >>>>>>>>>>>>> After the upgrade HVM domUs appear to no longer work - >>>>>>>>>>>>> regardless >>>>>>>>>>>>> of the >>>>>>>>>>>>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 >>>>>>>>>>>>> kernel); PV >>>>>>>>>>>>> domUs, however, work just fine as before on both dom0 kerne= ls. >>>>>>>>>>>>> >>>>>>>>>>>>> xl dmesg shows the following information after the first >>>>>>>>>>>>> crashed HVM >>>>>>>>>>>>> domU which is started as part of the machine booting up: >>>>>>>>>>>>> [...] >>>>>>>>>>>>> (XEN) Failed vm entry (exit reason 0x80000021) caused by >>>>>>>>>>>>> invalid guest >>>>>>>>>>>>> state (0). >>>>>>>>>>>>> (XEN) ************* VMCS Area ************** >>>>>>>>>>>>> (XEN) *** Guest State *** >>>>>>>>>>>>> (XEN) CR0: actual=3D0x0000000000000039, >>>>>>>>>>>>> shadow=3D0x0000000000000011, >>>>>>>>>>>>> gh_mask=3Dffffffffffffffff >>>>>>>>>>>>> (XEN) CR4: actual=3D0x0000000000002050, >>>>>>>>>>>>> shadow=3D0x0000000000000000, >>>>>>>>>>>>> gh_mask=3Dffffffffffffffff >>>>>>>>>>>>> (XEN) CR3: actual=3D0x0000000000800000, target_count=3D0 >>>>>>>>>>>>> (XEN) target0=3D0000000000000000, target1=3D0000000000= 000000 >>>>>>>>>>>>> (XEN) target2=3D0000000000000000, target3=3D0000000000= 000000 >>>>>>>>>>>>> (XEN) RSP =3D 0x0000000000006fdc (0x0000000000006fdc) RIP = =3D >>>>>>>>>>>>> 0x0000000100000000 (0x0000000100000000) >>>>>>>>>>>> Other than RIP looking odd for a guest still in non-paged >>>>>>>>>>>> protected >>>>>>>>>>>> mode I can't seem to spot anything wrong with guest state. >>>>>>>>>>> odd? That will be the source of the failure. >>>>>>>>>>> >>>>>>>>>>> Out of long mode, the upper 32bit of %rip should all be zero,= >>>>>>>>>>> and it >>>>>>>>>>> should not be possible to set any of them. >>>>>>>>>>> >>>>>>>>>>> I suspect that the guest has exited for emulation, and there >>>>>>>>>>> has been a >>>>>>>>>>> bad update to %rip. The alternative (which I hope is not the= >>>>>>>>>>> case) is >>>>>>>>>>> that there is a hardware errata which allows the guest to >>>>>>>>>>> accidentally >>>>>>>>>>> get it self into this condition. >>>>>>>>>>> >>>>>>>>>>> Are you able to rerun with a debug build of the hypervisor? >>> [big snip] >>>>>>>>>> Now _without_ the debug USE flag, but with debug information i= n >>>>>>>>>> the binary (I used splitdebug), all is back to where >>>>>>>>>> the problem >>>>>>>>>> started off (i.e. the system boots without issues >>>>>>>>>> until such >>>>>>>>>> time it starts a HVM domU which then crashes; PV >>>>>>>>>> domUs are >>>>>>>>>> working). I have attached the latest "xl dmesg" >>>>>>>>>> output with the >>>>>>>>>> timing information included. >>>>>>>>>> =20 >>>>> I hope any of this makes sense to you. >>>>> >>>>> Again many thanks and best regards >>>>> >>>> Right - it would appear that the USE flag is definitely not what you= >>>> wanted, and causes bad compilation for Xen. The do_IRQ disassembly >>>> you sent is a the result of disassembling a whole block of zeroes. >>>> Sorry for leading you on a goose chase - the double faults will be t= he >>>> product of bad compilation, rather than anything to do with your >>>> specific problem. >>> Hi Andrew, >>> there's absolutely no need to appologize as it is me who asked for he= lp >>> and you who generously stepped in and provided it. I really do >>> appreciate your help and it is for me, as the one seeking help, to >>> provide all the information you deem necessary and you ask for. >>>> However, the final log you sent (dmesg) is using a debug Xen, which = is >>>> what I was attempting to get you to do originally. >>> Next time I know better how to arrive at a debug XEN. It's all about >>> learning. >>>> We still observe that the VM ends up in 32bit non-paged mode but wit= h >>>> an RIP with bit 32 set, which is an invalid state to be in. However= , >>>> there was nothing particularly interesting in the extra log >>>> information. >>>> >>>> Please can you rerun with "hvm_debug=3D0xc3f", which will cause far = more >>>> logging to occur to the console while the HVM guest is running. Tha= t >>>> might show some hints. >>> I haven't done that yet - but please see my next paragraph. If you ar= e >>> still interested in this, for whatever reason, I am clearly more than= >>> happy to rerun with your suggested option and provide that informatio= n >>> as well. >>>> Also, the fact that this occurs just after starting SeaBIOS is >>>> interesting. As you have switched versions of Xen, you have also >>>> switched hvmloader, which contains the SeaBIOS binary embedded in it= =2E >>>> Would you be able to compile both 4.5.1 and 4.5.2 and switch the >>>> hvmloader binaries in use. It would be very interesting to see >>>> whether the failure is caused by the hvmloader binary or the >>>> hypervisor. (With `xl`, you can use >>>> firmware_override=3D"/full/path/to/firmware" to override the default= >>>> hvmloader). >>> Your analysis was absolutely spot on. After re-thinking this for a >>> moment, I thought going down that route first would make a lot of sen= se >>> as PV guests still do work and one of the differences to HVM domUs is= >>> that the former do _not_ require SeaBIOS. Looking at my log files of >>> installed packages confirmed an upgrade from SeaBIOS 1.7.5 to 1.8.2 i= n >>> the relevant timeframe which obviously had not made it to the hvmload= er >>> of xen-4.5.1 as I did not re-compile xen after the upgrade of SeaBIOS= =2E >>> >>> So I re-compiled xen-4.5.1 (obviously now using the installed SeaBIOS= >>> 1.8.2) and the same error as with xen-4.5.2 popped up - and that seem= ed >>> to strongly indicate that there indeed might be an issue with SeaBIOS= as >>> this probably was the only variable that had changed from the origina= l >>> install of xen-4.5.1. >>> >>> My next step was to downgrade SeaBIOS to 1.7.5 and to re-compile >>> xen-4.5.1. Voila, the system was again up and running. While still >>> having SeaBIOS 1.7.5 installed, I also re-compiled xen-4.5.2 and ... = you >>> probably guessed it ... the problem was gone: The system boots up wit= h >>> no issues and everything is fine again. >>> >>> So in a nutshell: There seems to be a problem with SeaBIOS 1.8.2 >>> preventing HVM doamins from successfully starting up. I don't know wh= at >>> this is triggered from, if this is specific to my hardware or whether= >>> something else in my environment is to blame. >>> >>> In any case, I am again more than happy to provide data / run a few >>> tests should you wish to get to the grounds of this. >>> >>> I do owe you a beer (or any other drink) should you ever be at my >>> location (i.e. Vienna, Austria). >>> >>> Many thanks again for your analysis and your first class support. Xen= >>> and their people absolutely rock! >>> >>> Atom2 >> I'm a little late to the thread but can you send me (you can do it >> off-list if you'd like) the USE flags you used for xen, xen-tools and >> seabios? Also emerge --info. You can kill two birds with one stone by >> using emerge --info xen. > Hi Doug, > here you go: Thanks. I'll use your configuration as a test point to update a few things with regard to the Gentoo ebuilds. I'm not the maintainer of Xen and SeaBIOS but I don't think the maintainers will have much issue with the changes. > USE flags: > app-emulation/xen-4.5.2-r1::gentoo USE=3D"-custom-cflags -debug -efi > -flask -xsm" > app-emulation/xen-tools-4.5.2::gentoo USE=3D"hvm pam pygrub python qem= u > screen system-seabios -api -custom-cflags -debug -doc -flask (-ocaml) > -ovmf -static-libs -system-qemu" PYTHON_TARGETS=3D"python2_7" > sys-firmware/seabios-1.7.5::gentoo USE=3D"binary" So looking at how SeaBIOS and friends are built I think we have an issue that needs to be addressed. That being said, you wouldn't have this issue if you did USE=3D"-system-seabios -system-qemu". I believe you woul= d also be ok if you had done USE=3D"system-seabios system-qemu". But after = a quick look at everything USE=3D"system-seabios -system-qemu" will definitely do the wrong thing. > emerge --info: Please see the attached file >> I'm not too familiar with the xen ebuilds but I was pretty sure that >> xen-tools is what builds hvmloader and it downloads a copy of SeaBIOS >> and builds it so that it remains consistent. But obviously your >> experience shows otherwise. > You are right, it's xen-tools that builds hvmloader. If I remember > correctly, the "system-seabios" USE flag (for xen-tools) specifies > whether sys-firmware/seabios is used and the latter downloads SeaBIOS i= n > it's binary form provided its "binary" USE flag is set. At least that's= > my understanding. >> I'm looking at some ideas to improve SeaBIOS packaging on Gentoo and >> your info would be helpful. > Great. Whatever makes gentoo and xen stronger will be awesome. What > immediately springs to mind is to create a separate hvmloader package > and slot that (that's just an idea and probably not fully thought > through, but ss far as I understood Andrew, it would then be possible t= o > specify the specific firmware version [i.e. hvmloader] to use on xl's > command line by using firmware_override=3D"full/path/to/firmware"). >=20 > I also found out that an upgrade to sys-firmware/seabios obviously does= > not trigger an automatic re-emerge of xen-tools and thus hvmloader. > Shouldn't this also happen automatically as xen-tools depends on seabio= s? >=20 > Thanks and best regards Atom2 >=20 >=20 > P.S. If you prefer to take this off-list, just reply to my mail address= =2E --=20 Doug Goldstein --6JawJUp4GWuefQOg81IIvxscFCxSX4SwE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0 iQJ8BAEBCgBmBQJWSjLWXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRBNTM5MEQ2RTNFMTkyNzlCNzVDMzIwOTVB MkJDMDNEQzg3RUQxQkQ0AAoJEKK8A9yH7RvUgaUP/0Ku1GwvqSI+dHxb+uhsHU2i y7mcVQSVu50Il1I6r1EnInaGBer4ThONpuYx1F9m97qo92KhJYutNRLpXb9hy/Eu EFSpZWE6uDIfFwEGpB6Tdbyk+AXt+hOBTjY55fV4YuQGpU7d5bPTAI83J9gUYAjk Mi5u1eJi1TkZRd7QW74s6mGHsBNHaUDXd/P/llovGhf29tP7wwUsNrOngJsL/JtT 66y7YA1t3Il914IGf2VFzipWFFwbFwLKT5z1UeNpus6qySO5Wb7e2EPRAoJYnorw 8A6SsUbVp4w0GXhkmtBvYceBfbIAtyUCoPQQQMv6cj5c+5LLs+Wl8oew2Jngn1xl XiwHi8WZB40kR3fI+0XIAoUk3I0gPsUb+DfJsx2T7CMXauBxK4r1v0Unahrma3JA 0W2Kg/CGiMTm56t5q9YCGEQKu+I/tq25Fsf2FdTn/OlANqkSVdVO2av/Aa0E+y69 cRji3xD+CU2dXC2v1XYIKySyakH4eMBGI5e2LXt7/kW0UNxdlYPaRQD0NaLlqIU7 OYK00QXIBFRQ5EonVC9ef5yOBxTdsq+n9paAIZ6d4jT/78Q30YCz5OtdqfIAoEMy 0cGi1ljjwVV5RXapxlPOPcRATf3HofCJqPy/58OGcaF7+qJPGL1eAWwbjGb8ay3x D60Yf/0Qrhh9HYqAu0ZS =eNcK -----END PGP SIGNATURE----- --6JawJUp4GWuefQOg81IIvxscFCxSX4SwE-- --===============3151212031025705598== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3151212031025705598==--