From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely
 disabled
Date: Tue, 13 Oct 2015 02:20:28 +0800
Message-ID: <561BF9EC.5060605@linux.intel.com>
References: <55FBDB6D.4040207@gmail.com> <55FBE248.4010809@redhat.com>
 <55FC4E6F.8030104@gmail.com> <55FF7095.5060106@linux.intel.com>
 <BLU437-SMTP104744D03206E82EA10655C80460@phx.gbl>
 <55FF7C41.7070400@linux.intel.com> <560D3F31.5000703@gmail.com>
 <560D40C2.5080205@redhat.com> <560E96D8.9080007@gmail.com>
 <56196FF1.8060902@linux.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: edk2-devel@ml01.01.org
To: Janusz <januszmk6@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Laszlo Ersek <lersek@redhat.com>, kvm@vger.kernel.org
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga14.intel.com ([192.55.52.115]:34670 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751373AbbJLS0i (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 12 Oct 2015 14:26:38 -0400
In-Reply-To: <56196FF1.8060902@linux.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 10/11/2015 04:07 AM, Xiao Guangrong wrote:
>
>
> On 10/02/2015 10:38 PM, Janusz wrote:
>> W dniu 01.10.2015 o 16:18, Paolo Bonzini pisze:
>>>
>>> On 01/10/2015 16:12, Janusz wrote:
>>>> Now, I can also add, that the problem is only when I allow VM to use
>>>> more than one core, so with option  for example:
>>>> -smp 8,cores=4,threads=2,sockets=1 and other combinations like -smp
>>>> 4,threads=1 its not working, and without it I am always running VM
>>>> without problems
>>>>
>>>> Any ideas what can it be? or any idea what would help to find out what
>>>> is causing this?
>>> I am going to send a revert of the patch tomorrow.
>>>
>>> Paolo
>> Thanks, but revert patch doesn't help, so something else is wrong here
>>
>
> It seems i can reproduce it now ... and finally i get little free time now :(
> I will dig into it and fix it asap.
>
> Thank you, Janusz and Paolo!

I think i have figured out the root case, i got these traces:
<...>-47935 [052] d... 20017.763244: kvm_exit: reason EPT_VIOLATION rip 0xa0000 info 184 0
            <...>-47935 [052] .... 20017.763244: kvm_page_fault: address a0000 error_code 184
            <...>-47935 [052] .... 20017.763269: mark_mmio_spte: sptep:ffff880841c3d500 gfn a0 
access 6 gen fff94
            <...>-47935 [052] .... 20017.763272: kvm_mmu_pagetable_walk: addr a0000 pferr 10 F
            <...>-47935 [052] .... 20017.763272: kvm_mmu_paging_element: pte bfeff023 level 4
            <...>-47935 [052] .... 20017.763273: kvm_mmu_paging_element: pte bff00023 level 3
            <...>-47935 [052] .... 20017.763273: kvm_mmu_paging_element: pte e3 level 2
            <...>-47935 [052] .... 20017.763274: kvm_emulate_insn: 0:a0000: (prot32)
            <...>-47935 [052] .... 20017.763274: kvm_emulate_insn: 0:a0000: (prot32) failed
            <...>-
It told me that guest is executing on address 0xa0000 but it is a MMIO address, so KVM
can not emulate it and complained with internal error.

Actually, 0xa0000 is belong to SMRAM (0x30000 is SMRAM base and 0x80000 is EIP offset,
0x30000 + 0x80000 = 0xa0000), however, from QEMU's dump:
EAX=bfefe000 EBX=00000002 ECX=00000000 EDX=00000600
ESI=00000000 EDI=00003eb8 EBP=00000000 ESP=00000000
EIP=000a0000 EFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0

we see that VCPU is not in SMM.

I dropped some patches (MTRR patches) then this bug can not be trigged so frequently but it
can not completely be avoided :(

I think we need to check OVMF's code to see if there is rare case that SMM hahdler is called
but KVM have not received SMI at that time...