From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42345)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <w.link@proxmox.com>) id 1ZxFuN-0004vo-U1
	for qemu-devel@nongnu.org; Fri, 13 Nov 2015 10:05:01 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <w.link@proxmox.com>) id 1ZxFuJ-0007mA-S9
	for qemu-devel@nongnu.org; Fri, 13 Nov 2015 10:04:59 -0500
Received: from proxmox.maurer-it.com ([94.136.31.133]:33858)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <w.link@proxmox.com>) id 1ZxFuJ-0007lZ-GL
	for qemu-devel@nongnu.org; Fri, 13 Nov 2015 10:04:55 -0500
From: Wolfgang Link <w.link@proxmox.com>
Message-ID: <5645F9FD.2050807@proxmox.com>
Date: Fri, 13 Nov 2015 15:55:57 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Problems with VM after kernel commit
 b18d5431acc7a2fd22767925f3a6f597aa4bd29e
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: guangrong.xiao@linux.intel.com, pbonzini@redhat.com
Cc: qemu-devel@nongnu.org

Hi,

We have problems with a HP ProLiant DL380 Gen9 and Vm's with large 
amount of memory, grater then 30GB.

The problem is that the vm need about 10 sec to start to boot.
and when it comes to syncing the cpu clock, it takes a long time to 
finish this task.
What also occurs is the vcpus need at startup 100% cpu usage and take 
over 1 min.

I could pin down the problem to the following kernel patch
> commit b18d5431acc7a2fd22767925f3a6f597aa4bd29e
> Author: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> Date:   Mon Jun 15 16:55:21 2015 +0800
>
>     KVM: x86: fix CR0.CD virtualization
>
>     Currently, CR0.CD is not checked when we virtualize memory cache 
> type for
>     noncoherent_dma guests, this patch fixes it by :
>
>     - setting UC for all memory if CR0.CD = 1
>     - zapping all the last sptes in MMU if CR0.CD is changed
>
>     Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 06a186b..2764381 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -8628,7 +8628,8 @@ static int get_ept_level(void)
>
>  static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool 
> is_mmio)
>  {
> -       u64 ret;
> +       u8 cache;
> +       u64 ipat = 0;
>
>         /* For VT-d and EPT combination
>          * 1. MMIO: always map as UC
> @@ -8641,16 +8642,27 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu 
> *vcpu, gfn_t gfn, bool is_mmio)
>          * 3. EPT without VT-d: always map as WB and set IPAT=1 to keep
>          *    consistent with host MTRR
>          */
> -       if (is_mmio)
> -               ret = MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
> -       else if (kvm_arch_has_noncoherent_dma(vcpu->kvm))
> -               ret = kvm_get_guest_memory_type(vcpu, gfn) <<
> -                     VMX_EPT_MT_EPTE_SHIFT;
> -       else
> -               ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
> -                       | VMX_EPT_IPAT_BIT;
> +       if (is_mmio) {
> +               cache = MTRR_TYPE_UNCACHABLE;
> +               goto exit;
> +       }
>
> -       return ret;
> +       if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
> +               ipat = VMX_EPT_IPAT_BIT;
> +               cache = MTRR_TYPE_WRBACK;
> +               goto exit;
> +       }
> +
> +       if (kvm_read_cr0(vcpu) & X86_CR0_CD) {
> +               ipat = VMX_EPT_IPAT_BIT;
> +               cache = MTRR_TYPE_UNCACHABLE;
> +               goto exit;
> +       }
> +
> +       cache = kvm_get_guest_memory_type(vcpu, gfn);
> +
> +exit:
> +       return (cache << VMX_EPT_MT_EPTE_SHIFT) | ipat;
>  }
>
>  static int vmx_get_lpage_level(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 43f0df7..43fdb10 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -621,6 +621,10 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
> long cr0)
>
>         if ((cr0 ^ old_cr0) & update_bits)
>                 kvm_mmu_reset_context(vcpu);
> +
> +       if ((cr0 ^ old_cr0) & X86_CR0_CD)
> +               kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
> +
>         return 0;
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_cr0);

The problem is I do not understand exactly the patch and what it is doing!
How is this related to numa architecture and huge amount of Memory in Vm's?
Because when I test the same on a single socket machine there are no 
problems.

Thanks for any advice.