From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests Date: Fri, 05 Dec 2008 09:34:20 -0600 Message-ID: <493949FC.4060700@codemonkey.ws> References: <49392CB6.9000000@amd.com> <49393A78.5030601@codemonkey.ws> <49394862.4090306@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andre Przywara , kvm@vger.kernel.org, "Daniel P. Berrange" To: Avi Kivity Return-path: Received: from yx-out-2324.google.com ([74.125.44.28]:19690 "EHLO yx-out-2324.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752963AbYLEPec (ORCPT ); Fri, 5 Dec 2008 10:34:32 -0500 Received: by yx-out-2324.google.com with SMTP id 8so27663yxm.1 for ; Fri, 05 Dec 2008 07:34:30 -0800 (PST) In-Reply-To: <49394862.4090306@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > Anthony Liguori wrote: >> >> In the event that the VM is larger than a single node, if a user is >> creating it via qemu-system-x86_64, they're going to either not care >> at all about NUMA, or be familiar enough with the numactl tools that >> they'll probably just want to use that. Once you've got your head >> around the fact that VCPUs are just threads and the memory is just a >> shared memory segment, any knowledgable sysadmin will have no problem >> doing whatever sort of NUMA layout they want. >> > > The vast majority of production VMs will be created by management tools. I agree. > We need libnuma integrated in qemu. Using numactl outside of qemu > means we need to start exposing more and more qemu internals > (vcpu->thread mapping, memory in /dev/shm, phys_addr->ram_addr > mapping) and lose out on optimization opportunities (having multiple > numa-aware iothreads, numa-aware kvm mmu). It also means we cause > duplication of the numa logic in management tools instead of > consolidation in qemu. I think it's the opposite. Integrating libnuma in QEMU means duplication of numactl functionality in QEMU. What you'd really want, I think, is to be able to use numactl but say -qemu-guest-memory-offset 1G -qemu-guest-memory-size 1G. The /dev/shm approximates that pretty well. Also, the current patches don't do the most useful thing, they don't use provide an interface for dynamically changing numa attributes. But, as I said, if there's agreement that we should bake this into QEMU, then so be it. But let's make this a separate conversation than the rest of the patches. Regards, Anthony Liguori