From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@amd.com>
Subject: Re: [PATCH 0/3][RFC] NUMA: add host side pinning
Date: Thu, 24 Jun 2010 08:44:00 +0200
Message-ID: <4C22FEB0.2020002@amd.com>
References: <1277327377-29629-1-git-send-email-andre.przywara@amd.com> <4C2288DD.3020207@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"agraf@suse.de" <agraf@suse.de>
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from tx2ehsobe004.messaging.microsoft.com ([65.55.88.14]:13398 "EHLO
	TX2EHSOBE008.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751398Ab0FXGmU (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 24 Jun 2010 02:42:20 -0400
In-Reply-To: <4C2288DD.3020207@codemonkey.ws>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Anthony Liguori wrote:
> On 06/23/2010 04:09 PM, Andre Przywara wrote:
>> Hi,
>>
>> these three patches add basic NUMA pinning to KVM. According to a user
>> provided assignment parts of the guest's memory will be bound to different
>> host nodes. This should increase performance in large virtual machines
>> and on loaded hosts.
>> These patches are quite basic (but work) and I send them as RFC to get
>> some feedback before implementing stuff in vain.
>>
 >> ....
>>
>> Please comment on the approach in general and the implementation.
>>    
> 
> If we extended integrated -mem-path with -numa such that a different 
> path could be used with each numa node (and we let an explicit file be 
> specified instead of just a directory), then if I understand correctly, 
> we could use numactl without any specific integration in qemu.  Does 
> this sound correct?
In general, yes. But I consider the whole hugetlbfs approach broken. 
Since 2.6.32 or so you can use MAP_HUGETLB together with MAP_ANONYMOUS 
in mmap() to avoid hugetlbfs at all, and I bet that the future will hold 
transparent hugepages anyway (RHEL6 already has them).
I am not sure whether you want to keep the -memfile option and extend it 
with some pseudo compat glue (faked directory names to be interpreted by 
QEMU) to make it work in the future. But anyway in these cases the 
external numactl approach would not work anymore.

> IOW:
> 
> qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem 
> -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
> 
> It's then possible to say:
> 
> numactl --file /dev/shm/node0.mem --interleave=0,1
> numactl --file /dev/shm/node1.mem --membind=2
> 
> I think this approach is nicer because it gives the user a lot more 
> flexibility without having us chase other tools like numactl.  For 
> instance, your patches only support pinning and not interleaving.
That's right. I put it on the list ;-)

Thanks for the good hint on the huge pages issue, as this is not 
properly handled in the current implementation. I will think about a 
proper way to handle this, but would still opt for a (at least 
partially) QEMU integrated solution.
Still open for discussion, though, as I see your point of avoiding 
duplicate NUMA implementation between numactl and QEMU.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 488-3567-12