From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2 Date: Thu, 12 Nov 2015 13:01:15 +0000 Message-ID: <56448D9B.4090007@citrix.com> References: <5643E68C.8090406@web2web.at> <564499B002000078000B43EE@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <564499B002000078000B43EE@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Atom2 Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 12/11/15 12:52, Jan Beulich wrote: >>>> On 12.11.15 at 02:08, wrote: >> After the upgrade HVM domUs appear to no longer work - regardless of the >> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 kernel); PV >> domUs, however, work just fine as before on both dom0 kernels. >> >> xl dmesg shows the following information after the first crashed HVM >> domU which is started as part of the machine booting up: >> [...] >> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest >> state (0). >> (XEN) ************* VMCS Area ************** >> (XEN) *** Guest State *** >> (XEN) CR0: actual=0x0000000000000039, shadow=0x0000000000000011, >> gh_mask=ffffffffffffffff >> (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000, >> gh_mask=ffffffffffffffff >> (XEN) CR3: actual=0x0000000000800000, target_count=0 >> (XEN) target0=0000000000000000, target1=0000000000000000 >> (XEN) target2=0000000000000000, target3=0000000000000000 >> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc) RIP = >> 0x0000000100000000 (0x0000000100000000) > Other than RIP looking odd for a guest still in non-paged protected > mode I can't seem to spot anything wrong with guest state. odd? That will be the source of the failure. Out of long mode, the upper 32bit of %rip should all be zero, and it should not be possible to set any of them. I suspect that the guest has exited for emulation, and there has been a bad update to %rip. The alternative (which I hope is not the case) is that there is a hardware errata which allows the guest to accidentally get it self into this condition. Are you able to rerun with a debug build of the hypervisor? ~Andrew