From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Subject: Re: [edk2] KVM: MTRR: fix memory type handling if MTRR is completely
 disabled
Date: Thu, 15 Oct 2015 15:10:37 +0800
Message-ID: <561F516D.7070504@linux.intel.com>
References: <55FBDB6D.4040207@gmail.com> <55FBE248.4010809@redhat.com>
 <55FC4E6F.8030104@gmail.com> <55FF7095.5060106@linux.intel.com>
 <BLU437-SMTP104744D03206E82EA10655C80460@phx.gbl>
 <55FF7C41.7070400@linux.intel.com> <560D3F31.5000703@gmail.com>
 <560D40C2.5080205@redhat.com> <560E96D8.9080007@gmail.com>
 <561DD2EC.5040800@linux.intel.com> <561E0655.8080508@gmail.com>
 <561E1121.7030502@linux.intel.com> <561E1329.5080109@linux.intel.com>
 <561E9A36.3080302@gmail.com> <561F2952.5060300@linux.intel.com>
 <561F4589.5050609@gmail.com> <561F4AAE.3060204@linux.intel.com>
 <561F4E92.3090403@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: edk2-devel@ml01.01.org,
	Alex Williamson <alex.williamson@redhat.com>
To: Janusz <januszmk6@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Laszlo Ersek <lersek@redhat.com>, kvm@vger.kernel.org
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga01.intel.com ([192.55.52.88]:65534 "EHLO mga01.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753794AbbJOHQv (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 15 Oct 2015 03:16:51 -0400
In-Reply-To: <561F4E92.3090403@gmail.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>


On 10/15/2015 02:58 PM, Janusz wrote:
> W dniu 15.10.2015 o 08:41, Xiao Guangrong pisze:
>>
>>
>> On 10/15/2015 02:19 PM, Janusz wrote:
>>> W dniu 15.10.2015 o 06:19, Xiao Guangrong pisze:
>>>>
>>>>
>>>>
>>>> Well, the bug may be not in KVM. When this bug happened, i saw OVMF
>>>> only checked 1 CPU out, there is the log from OVMF's debug input:
>>>>
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCD
>>>>     Flushing GCDs
>>>> Detect CPU count: 1
>>>>
>>>> So that the startup code has been freed however the APs are still
>>>> running,
>>>> i think that why we saw the vCPUs executed on unexpected address.
>>>>
>>>> After digging into OVMF's code, i noticed that BSP CPU waits for APs
>>>> for a fixed timer period, however, KVM recent changes require zap all
>>>> mappings if CR0.CD is changed, that means the APs need more time to
>>>> startup.
>>>>
>>>> After following changes to OVMF, the bug is completely gone on my side:
>>>>
>>>> --- a/UefiCpuPkg/CpuDxe/ApStartup.c
>>>> +++ b/UefiCpuPkg/CpuDxe/ApStartup.c
>>>> @@ -454,7 +454,9 @@ StartApsStackless (
>>>>      //
>>>>      // Wait 100 milliseconds for APs to arrive at the ApEntryPoint
>>>> routine
>>>>      //
>>>> -  MicroSecondDelay (100 * 1000);
>>>> +  MicroSecondDelay (10 * 100 * 1000);
>>>>
>>>>      return EFI_SUCCESS;
>>>>    }
>>>>
>>>> Janusz, could you please check this instead? You can switch to your
>>>> previous kernel to do this test.
>>>>
>>>>
>>> Ok, now first time when I started VM I was able to start system
>>> successfully. When I turned it off and started it again, it restarted my
>>> vm at system boot couple of times. Sometimes I also get very high cpu
>>> usage for no reason. Also, I get less fps in GTA 5 than in kernel 4.1, I
>>> get something like 30-55, but on 4.1 I get all the time 60 fps. This is
>>> my new log: https://bpaste.net/show/61a122ad7fe5
>>>
>>
>> Just confirm: the Qemu internal error did not appear any more, right?
> Yes, when I reverted your first patch, switched to -vga std from -vga
> none and didn't passthrough my GPU (case when I got this internal
> error), vm started without problem. I even didn't get any VM restarts
> like with passthrough
>

Wow, it seems we have fixed the QEMU internal error now. :)

Recurrently, Paolo has reverted some MTRR patches, was your test
based on these reverted patches?

The GPU passthrough issue may be related to vfio (not sure), Alex, do
you have any idea?

Laszlo, could you please check the root case is reasonable and fix it in
OVMF if it's right?

BTW, OVMF handles #UD with no trace - nothing is killed, and no call trace
in the debug input...