From mboxrd@z Thu Jan 1 00:00:00 1970 From: Atom2 Subject: Re: HVM domains crash after upgrade from XEN 4.5.1 to 4.5.2 Date: Thu, 12 Nov 2015 15:29:28 +0100 Message-ID: <5644A248.1060505@web2web.at> References: <5643E68C.8090406@web2web.at> <564499B002000078000B43EE@prv-mh.provo.novell.com> <56448D9B.4090007@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <56448D9B.4090007@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Jan Beulich Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Hi Andrew, thanks for your reply. Answers are inline further down. Am 12.11.15 um 14:01 schrieb Andrew Cooper: > On 12/11/15 12:52, Jan Beulich wrote: >>>>> On 12.11.15 at 02:08, wrote: >>> After the upgrade HVM domUs appear to no longer work - regardless of the >>> dom0 kernel (tested with both 3.18.9 and 4.1.7 as the dom0 kernel); PV >>> domUs, however, work just fine as before on both dom0 kernels. >>> >>> xl dmesg shows the following information after the first crashed HVM >>> domU which is started as part of the machine booting up: >>> [...] >>> (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest >>> state (0). >>> (XEN) ************* VMCS Area ************** >>> (XEN) *** Guest State *** >>> (XEN) CR0: actual=0x0000000000000039, shadow=0x0000000000000011, >>> gh_mask=ffffffffffffffff >>> (XEN) CR4: actual=0x0000000000002050, shadow=0x0000000000000000, >>> gh_mask=ffffffffffffffff >>> (XEN) CR3: actual=0x0000000000800000, target_count=0 >>> (XEN) target0=0000000000000000, target1=0000000000000000 >>> (XEN) target2=0000000000000000, target3=0000000000000000 >>> (XEN) RSP = 0x0000000000006fdc (0x0000000000006fdc) RIP = >>> 0x0000000100000000 (0x0000000100000000) >> Other than RIP looking odd for a guest still in non-paged protected >> mode I can't seem to spot anything wrong with guest state. > odd? That will be the source of the failure. > > Out of long mode, the upper 32bit of %rip should all be zero, and it > should not be possible to set any of them. > > I suspect that the guest has exited for emulation, and there has been a > bad update to %rip. The alternative (which I hope is not the case) is > that there is a hardware errata which allows the guest to accidentally > get it self into this condition. > > Are you able to rerun with a debug build of the hypervisor? Given that I am compiling from source under gentoo and provided you lend me a helping hand in case I get stuck, I am confident that this is possible. gentoo has three xen packages (they call those ebuilds) as follows app-emulation/xen app-emulation/xen-tools app-emulation/pvgrub all of which are installed on my system. The former two offer a debug USE-flag and I assume that debug code for the latter is not required as this is for (the still working) PV domUs only. Furthermore as you are talking about the hypervisor, I guess it is safe to assume that it is app-emulation/xen and not xen-tools. Right? BTW: The description of the debug USE flag reads as follows: Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces I assume that backtraces are probably not required to get things moving. Another question is whether prior to enabling the debug USE flag it might make sense to re-compile with gcc-4.8.5 (please see my previous list reply) to rule out any compiler related issues. Jan, Andrew - what are your thoughts? Many thanks Atom2