From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests Date: Fri, 05 Dec 2008 17:27:30 +0200 Message-ID: <49394862.4090306@redhat.com> References: <49392CB6.9000000@amd.com> <49393A78.5030601@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andre Przywara , kvm@vger.kernel.org, "Daniel P. Berrange" To: Anthony Liguori Return-path: Received: from mx2.redhat.com ([66.187.237.31]:49924 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753450AbYLEP1Z (ORCPT ); Fri, 5 Dec 2008 10:27:25 -0500 In-Reply-To: <49393A78.5030601@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > > In the event that the VM is larger than a single node, if a user is > creating it via qemu-system-x86_64, they're going to either not care > at all about NUMA, or be familiar enough with the numactl tools that > they'll probably just want to use that. Once you've got your head > around the fact that VCPUs are just threads and the memory is just a > shared memory segment, any knowledgable sysadmin will have no problem > doing whatever sort of NUMA layout they want. > The vast majority of production VMs will be created by management tools. > The other case is where management tools are creating VMs. In this > case, it's probably better to use numactl as an external tool because > then it keeps things consistent wrt CPU pinning. > > There's also a good argument for not introducing CPU pinning directly > to QEMU. There are multiple ways to effectively do CPU pinning. You > can use taskset, you can use cpusets or even something like libcgroup. > > If you refactor the series so that the libnuma patch is the very last > one and submit to qemu-devel, I'll review and apply all of the first > patches. We can continue to discuss the last patch independently of > the first three if needed. We need libnuma integrated in qemu. Using numactl outside of qemu means we need to start exposing more and more qemu internals (vcpu->thread mapping, memory in /dev/shm, phys_addr->ram_addr mapping) and lose out on optimization opportunities (having multiple numa-aware iothreads, numa-aware kvm mmu). It also means we cause duplication of the numa logic in management tools instead of consolidation in qemu. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.