From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: EPT: Misconfiguration Date: Wed, 26 Jan 2011 11:52:27 +0200 Message-ID: <4D3FEEDB.5070407@redhat.com> References: <20110121132247.GA3097@amt.cnet> <4D3F0AD1.9010903@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm To: Ruben Kerkhof Return-path: Received: from mx1.redhat.com ([209.132.183.28]:63527 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751239Ab1AZJwd (ORCPT ); Wed, 26 Jan 2011 04:52:33 -0500 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 01/25/2011 08:29 PM, Ruben Kerkhof wrote: > > When you say "suddenly", this was with no changes to software and hardware? > > The host software and hardware hasn't changed in the two months since > the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13. > > We host customer vms on it though, so virtual machines come and go. > Various operating systems, a mixture of Linux, FreeBSD and Windows > 2008 R2. We have other machines with the same config without these > problems though. Are those other machines running a similar workload? The traces look awfully like bad hardware, though that can also be explained by random memory corruption due to a bug. > This time I have a few different messages though: > > 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault: 0000 [#1] SMP > > RSI: 0000000000000000 RDI: 1603a07305001568 > > 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46 > 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d > 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 ff 4f 08 0f 94 c0 84 > c0 74 10 85 f6 75 07 e8 63 fe ff ff eb lock decl 0x8(%rdi) %rdi is completely crap, looks like corruption again. Strangely, it is similar to the bad spte from the previous trace: 0x1603a0730500d277. The upper 48 bits are identical, the lower 16 bits are different.: > 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted > page table at address 7f37b37ff000 > 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD > 94e538067 PMD 61e5bf067 PTE 1603a0730500e067 Here are those magic 48 bits again, in the PTE entry. > 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration. > 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038 > 2011-01-25T12:38:49.417526+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4 > 2011-01-25T12:38:49.417532+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x5db595007 level 3 > 2011-01-25T12:38:49.417553+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2 > 2011-01-25T12:38:49.417558+01:00 phy005 kernel: > ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1 Again. > 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in > process qemu-kvm pte:1603a0730500d067 pmd:61059f067 Again. However, these all came from a single boot, yes? If so they can be the same corruption. Please collect more traces, with reboots in between. -- error compiling committee.c: too many arguments to function