From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH 0/3] v2: KVM-userspace: add NUMA support for guests
Date: Fri, 05 Dec 2008 17:27:30 +0200
Message-ID: <49394862.4090306@redhat.com>
References: <49392CB6.9000000@amd.com> <49393A78.5030601@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Andre Przywara <andre.przywara@amd.com>, kvm@vger.kernel.org,
	"Daniel P. Berrange" <berrange@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:49924 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1753450AbYLEP1Z (ORCPT <rfc822;kvm@vger.kernel.org>);
	Fri, 5 Dec 2008 10:27:25 -0500
In-Reply-To: <49393A78.5030601@codemonkey.ws>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Anthony Liguori wrote:
>
> In the event that the VM is larger than a single node, if a user is 
> creating it via qemu-system-x86_64, they're going to either not care 
> at all about NUMA, or be familiar enough with the numactl tools that 
> they'll probably just want to use that.  Once you've got your head 
> around the fact that VCPUs are just threads and the memory is just a 
> shared memory segment, any knowledgable sysadmin will have no problem 
> doing whatever sort of NUMA layout they want.
>

The vast majority of production VMs will be created by management tools.

> The other case is where management tools are creating VMs.  In this 
> case, it's probably better to use numactl as an external tool because 
> then it keeps things consistent wrt CPU pinning.
>
> There's also a good argument for not introducing CPU pinning directly 
> to QEMU.  There are multiple ways to effectively do CPU pinning.  You 
> can use taskset, you can use cpusets or even something like libcgroup.
>
> If you refactor the series so that the libnuma patch is the very last 
> one and submit to qemu-devel, I'll review and apply all of the first 
> patches.  We can continue to discuss the last patch independently of 
> the first three if needed.

We need libnuma integrated in qemu.  Using numactl outside of qemu means 
we need to start exposing more and more qemu internals (vcpu->thread 
mapping, memory in /dev/shm, phys_addr->ram_addr mapping) and lose out 
on optimization opportunities (having multiple numa-aware iothreads, 
numa-aware kvm mmu).  It also means we cause duplication of the numa 
logic in management tools instead of consolidation in qemu.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.