From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: [PATCH 0/3][RFC] NUMA: add host side pinning Date: Thu, 24 Jun 2010 13:34:41 +0200 Message-ID: <4C2342D1.4090103@amd.com> References: <1277327377-29629-1-git-send-email-andre.przywara@amd.com> <4C2288DD.3020207@codemonkey.ws> <865764AB-4E51-4ED4-8832-AED6A237A9D3@suse.de> <4C233A6D.7030805@amd.com> <4C233DAB.60106@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: Alexander Graf , Anthony Liguori , "kvm@vger.kernel.org" To: Avi Kivity Return-path: Received: from tx2ehsobe003.messaging.microsoft.com ([65.55.88.13]:57201 "EHLO TX2EHSOBE006.bigfish.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755067Ab0FXLiR (ORCPT ); Thu, 24 Jun 2010 07:38:17 -0400 In-Reply-To: <4C233DAB.60106@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Avi Kivity wrote: > On 06/24/2010 01:58 PM, Andre Przywara wrote: >>> So who would create the /dev/shm/nodeXX files? >> Currently it is QEMU. It creates a somewhat unique filename, opens and >> unlinks it. The difference would be to name the file after the option >> and to not unlink it. >> >>> I can imagine starting numactl before qemu, even though that's >>> cumbersome. I don't think it's feasible to start numactl after >>> qemu is running. That'd involve way too much magic that I'd prefer >>> qemu to call numactl itself. >> Using the current code the files would not exist before QEMU allocated >> RAM, and after that it could already touch pages before numactl set >> the policy. > > Non-anonymous memory doesn't work well with ksm and transparent > hugepages. Is it possible to use anonymous memory rather than file backed? I'd prefer non-file backed, too. But that is how the current huge pages implementation is done. We could use MAP_HUGETLB and declare NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find an easy way to detect the presence of the MAP_HUGETLB flag. If the kernel does not support it, it seems that mmap silently ignores it and uses 4KB pages instead. >> To avoid this I'd like to see the pinning done from within QEMU. I am >> not sure whether calling numactl via system() and friends is OK, I'd >> prefer to run the syscalls directly (like in patch 3/3) and pull the >> necessary options into the -numa pin,... command line. We could mimic >> numactl's syntax here. > > Definitely not use system(), but IIRC numactl has a library interface? Right, that is what I include in patch 3/3 and use. I got the impression Anthony wanted to avoid reimplementing parts of numactl, especially enabling the full flexibility of the command line interface (like specifying nodes, policies and interleaving). I want QEMU to use the library and pull the necessary options into the -numa pin,... parsing, even if this means duplicating numactl functionality. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany Tel: +49 351 448-3567-12