From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH 0/3][RFC] NUMA: add host side pinning Date: Thu, 24 Jun 2010 08:44:00 +0200 Message-ID: <4C22FEB0.2020002@amd.com> References: <1277327377-29629-1-git-send-email-andre.przywara@amd.com> <4C2288DD.3020207@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: "kvm@vger.kernel.org" , "agraf@suse.de" To: Anthony Liguori Return-path: Received: from tx2ehsobe004.messaging.microsoft.com ([65.55.88.14]:13398 "EHLO TX2EHSOBE008.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751398Ab0FXGmU (ORCPT ); Thu, 24 Jun 2010 02:42:20 -0400 In-Reply-To: <4C2288DD.3020207@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > On 06/23/2010 04:09 PM, Andre Przywara wrote: >> Hi, >> >> these three patches add basic NUMA pinning to KVM. According to a user >> provided assignment parts of the guest's memory will be bound to different >> host nodes. This should increase performance in large virtual machines >> and on loaded hosts. >> These patches are quite basic (but work) and I send them as RFC to get >> some feedback before implementing stuff in vain. >> >> .... >> >> Please comment on the approach in general and the implementation. >> > > If we extended integrated -mem-path with -numa such that a different > path could be used with each numa node (and we let an explicit file be > specified instead of just a directory), then if I understand correctly, > we could use numactl without any specific integration in qemu. Does > this sound correct? In general, yes. But I consider the whole hugetlbfs approach broken. Since 2.6.32 or so you can use MAP_HUGETLB together with MAP_ANONYMOUS in mmap() to avoid hugetlbfs at all, and I bet that the future will hold transparent hugepages anyway (RHEL6 already has them). I am not sure whether you want to keep the -memfile option and extend it with some pseudo compat glue (faked directory names to be interpreted by QEMU) to make it work in the future. But anyway in these cases the external numactl approach would not work anymore. > IOW: > > qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem > -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem > > It's then possible to say: > > numactl --file /dev/shm/node0.mem --interleave=0,1 > numactl --file /dev/shm/node1.mem --membind=2 > > I think this approach is nicer because it gives the user a lot more > flexibility without having us chase other tools like numactl. For > instance, your patches only support pinning and not interleaving. That's right. I put it on the list ;-) Thanks for the good hint on the huge pages issue, as this is not properly handled in the current implementation. I will think about a proper way to handle this, but would still opt for a (at least partially) QEMU integrated solution. Still open for discussion, though, as I see your point of avoiding duplicate NUMA implementation between numactl and QEMU. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 488-3567-12