From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Bonzini Subject: Re: VM boot failure on nodes not having DMA32 zone Date: Tue, 24 Jul 2018 20:05:49 +0200 Message-ID: <036fd396-9b5f-a565-7aed-eec6ac5d5133@redhat.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org To: Liang C , rkrcmar@redhat.com Return-path: In-Reply-To: Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 24/07/2018 09:53, Liang C wrote: > Hi, > > We have a situation where our qemu processes need to be launched under > cgroup cpuset.mems control. This introduces an similar issue that was > discussed a few years ago. The difference here is that for our case, > not being able to allocate from DMA32 zone is a result a cgroup > restriction not mempolicy enforcement. Here is the steps to reproduce > the failure, > > mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone) > echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems > echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus > echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall > echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks > > #launch a virtual machine > kvm_init_vcpu failed: Cannot allocate memory > > There are workarounds, like always putting qemu processes onto the > node with DMA32 zone or not restricting qemu processes memory > allocation until that DMA32 alloc finishes (difficult to be precise). > But we would like to find a way to address the root cause. > > Considering the fact that the pae_root shadow should not be needed > when ept is in use, which is indeed our case - ept is always available > for us (guessing this is the same case for most of other users), we > made a patch roughly like this, > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index d594690..1d1b61e 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) > vcpu->arch.mmu.translate_gpa = translate_gpa; > vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa; > > - return alloc_mmu_pages(vcpu); > + return tdp_enabled ? 0 : alloc_mmu_pages(vcpu); > } > > void kvm_mmu_setup(struct kvm_vcpu *vcpu) > > > It works through our test cases. But we would really like to have your > insight on this patch before applying it in production environment and > contributing it back to the community. Thanks in advance for any help > you may provide! Yes, this looks good. However, I'd place the "if" in alloc_mmu_pages itself. Thanks, Paolo