Am 15.11.15 um 21:12 schrieb Doug Goldstein: > On 11/14/15 6:14 PM, Atom2 wrote: >> Am 14.11.15 um 21:32 schrieb Andrew Cooper: >>> On 14/11/2015 00:16, Atom2 wrote: >>>> Am 13.11.15 um 11:09 schrieb Andrew Cooper: >>>>> On 13/11/15 07:25, Jan Beulich wrote: >>>>>>>>> On 13.11.15 at 00:00, wrote: >>>>>>> Am 12.11.15 um 17:43 schrieb Andrew Cooper: >>>>>>>> On 12/11/15 14:29, Atom2 wrote: >>>>>>>>> Hi Andrew, >>>>>>>>> thanks for your reply. Answers are inline further down. >>>>>>>>> >>>>>>>>> Am 12.11.15 um 14:01 schrieb Andrew Cooper: >>>>>>>>>> On 12/11/15 12:52, Jan Beulich wrote: >>>>>>>>>>>>>> On 12.11.15 at 02:08, wrote: >>>>>>>>>>>> After the upgrade HVM domUs appear to no longer work - regardless >>>>>>>>>>>> of the >>>>>>>>>>>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 kernel); PV >>>>>>>>>>>> domUs, however, work just fine as before on both dom0 kernels. >>>>>>>>>>>> >>>>>>>>>>>> xl dmesg shows the following information after the first crashed HVM >>>>>>>>>>>> domU which is started as part of the machine booting up: >>>>>>>>>>>> [...] >>>>>>>>>>>> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest >>>>>>>>>>>> state (0). >>>>>>>>>>>> (XEN) ************* VMCS Area ************** >>>>>>>>>>>> (XEN) *** Guest State *** >>>>>>>>>>>> (XEN) CR0: actual=0x0000000000000039, shadow=0x0000000000000011, >>>>>>>>>>>> gh_mask=ffffffffffffffff >>>>>>>>>>>> (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000, >>>>>>>>>>>> gh_mask=ffffffffffffffff >>>>>>>>>>>> (XEN) CR3: actual=0x0000000000800000, target_count=0 >>>>>>>>>>>> (XEN) target0=0000000000000000, target1=0000000000000000 >>>>>>>>>>>> (XEN) target2=0000000000000000, target3=0000000000000000 >>>>>>>>>>>> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc) RIP = >>>>>>>>>>>> 0x0000000100000000 (0x0000000100000000) >>>>>>>>>>> Other than RIP looking odd for a guest still in non-paged protected >>>>>>>>>>> mode I can't seem to spot anything wrong with guest state. >>>>>>>>>> odd? That will be the source of the failure. >>>>>>>>>> >>>>>>>>>> Out of long mode, the upper 32bit of %rip should all be zero, and it >>>>>>>>>> should not be possible to set any of them. >>>>>>>>>> >>>>>>>>>> I suspect that the guest has exited for emulation, and there has been a >>>>>>>>>> bad update to %rip. The alternative (which I hope is not the case) is >>>>>>>>>> that there is a hardware errata which allows the guest to accidentally >>>>>>>>>> get it self into this condition. >>>>>>>>>> >>>>>>>>>> Are you able to rerun with a debug build of the hypervisor? >> [big snip] >>>>>>>>> Now _without_ the debug USE flag, but with debug information in >>>>>>>>> the binary (I used splitdebug), all is back to where the problem >>>>>>>>> started off (i.e. the system boots without issues until such >>>>>>>>> time it starts a HVM domU which then crashes; PV domUs are >>>>>>>>> working). I have attached the latest "xl dmesg" output with the >>>>>>>>> timing information included. >>>>>>>>> >>>> I hope any of this makes sense to you. >>>> >>>> Again many thanks and best regards >>>> >>> Right - it would appear that the USE flag is definitely not what you >>> wanted, and causes bad compilation for Xen. The do_IRQ disassembly >>> you sent is a the result of disassembling a whole block of zeroes. >>> Sorry for leading you on a goose chase - the double faults will be the >>> product of bad compilation, rather than anything to do with your >>> specific problem. >> Hi Andrew, >> there's absolutely no need to appologize as it is me who asked for help >> and you who generously stepped in and provided it. I really do >> appreciate your help and it is for me, as the one seeking help, to >> provide all the information you deem necessary and you ask for. >>> However, the final log you sent (dmesg) is using a debug Xen, which is >>> what I was attempting to get you to do originally. >> Next time I know better how to arrive at a debug XEN. It's all about >> learning. >>> We still observe that the VM ends up in 32bit non-paged mode but with >>> an RIP with bit 32 set, which is an invalid state to be in. However, >>> there was nothing particularly interesting in the extra log information. >>> >>> Please can you rerun with "hvm_debug=0xc3f", which will cause far more >>> logging to occur to the console while the HVM guest is running. That >>> might show some hints. >> I haven't done that yet - but please see my next paragraph. If you are >> still interested in this, for whatever reason, I am clearly more than >> happy to rerun with your suggested option and provide that information >> as well. >>> Also, the fact that this occurs just after starting SeaBIOS is >>> interesting. As you have switched versions of Xen, you have also >>> switched hvmloader, which contains the SeaBIOS binary embedded in it. >>> Would you be able to compile both 4.5.1 and 4.5.2 and switch the >>> hvmloader binaries in use. It would be very interesting to see >>> whether the failure is caused by the hvmloader binary or the >>> hypervisor. (With `xl`, you can use >>> firmware_override="/full/path/to/firmware" to override the default >>> hvmloader). >> Your analysis was absolutely spot on. After re-thinking this for a >> moment, I thought going down that route first would make a lot of sense >> as PV guests still do work and one of the differences to HVM domUs is >> that the former do _not_ require SeaBIOS. Looking at my log files of >> installed packages confirmed an upgrade from SeaBIOS 1.7.5 to 1.8.2 in >> the relevant timeframe which obviously had not made it to the hvmloader >> of xen-4.5.1 as I did not re-compile xen after the upgrade of SeaBIOS. >> >> So I re-compiled xen-4.5.1 (obviously now using the installed SeaBIOS >> 1.8.2) and the same error as with xen-4.5.2 popped up - and that seemed >> to strongly indicate that there indeed might be an issue with SeaBIOS as >> this probably was the only variable that had changed from the original >> install of xen-4.5.1. >> >> My next step was to downgrade SeaBIOS to 1.7.5 and to re-compile >> xen-4.5.1. Voila, the system was again up and running. While still >> having SeaBIOS 1.7.5 installed, I also re-compiled xen-4.5.2 and ... you >> probably guessed it ... the problem was gone: The system boots up with >> no issues and everything is fine again. >> >> So in a nutshell: There seems to be a problem with SeaBIOS 1.8.2 >> preventing HVM doamins from successfully starting up. I don't know what >> this is triggered from, if this is specific to my hardware or whether >> something else in my environment is to blame. >> >> In any case, I am again more than happy to provide data / run a few >> tests should you wish to get to the grounds of this. >> >> I do owe you a beer (or any other drink) should you ever be at my >> location (i.e. Vienna, Austria). >> >> Many thanks again for your analysis and your first class support. Xen >> and their people absolutely rock! >> >> Atom2 > I'm a little late to the thread but can you send me (you can do it > off-list if you'd like) the USE flags you used for xen, xen-tools and > seabios? Also emerge --info. You can kill two birds with one stone by > using emerge --info xen. Hi Doug, here you go: USE flags: app-emulation/xen-4.5.2-r1::gentoo USE="-custom-cflags -debug -efi -flask -xsm" app-emulation/xen-tools-4.5.2::gentoo USE="hvm pam pygrub python qemu screen system-seabios -api -custom-cflags -debug -doc -flask (-ocaml) -ovmf -static-libs -system-qemu" PYTHON_TARGETS="python2_7" sys-firmware/seabios-1.7.5::gentoo USE="binary" emerge --info: Please see the attached file > I'm not too familiar with the xen ebuilds but I was pretty sure that > xen-tools is what builds hvmloader and it downloads a copy of SeaBIOS > and builds it so that it remains consistent. But obviously your > experience shows otherwise. You are right, it's xen-tools that builds hvmloader. If I remember correctly, the "system-seabios" USE flag (for xen-tools) specifies whether sys-firmware/seabios is used and the latter downloads SeaBIOS in it's binary form provided its "binary" USE flag is set. At least that's my understanding. > I'm looking at some ideas to improve SeaBIOS packaging on Gentoo and > your info would be helpful. Great. Whatever makes gentoo and xen stronger will be awesome. What immediately springs to mind is to create a separate hvmloader package and slot that (that's just an idea and probably not fully thought through, but ss far as I understood Andrew, it would then be possible to specify the specific firmware version [i.e. hvmloader] to use on xl's command line by using firmware_override="full/path/to/firmware"). I also found out that an upgrade to sys-firmware/seabios obviously does not trigger an automatic re-emerge of xen-tools and thus hvmloader. Shouldn't this also happen automatically as xen-tools depends on seabios? Thanks and best regards Atom2 P.S. If you prefer to take this off-list, just reply to my mail address.