From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: [PATCH 0/3][RFC] NUMA: add host side pinning Date: Mon, 28 Jun 2010 18:26:01 +0200 Message-ID: <4C28CD19.9000001@suse.de> References: <1277327377-29629-1-git-send-email-andre.przywara@amd.com> <4C2288DD.3020207@codemonkey.ws> <865764AB-4E51-4ED4-8832-AED6A237A9D3@suse.de> <4C233A6D.7030805@amd.com> <4C233DAB.60106@redhat.com> <4C2342D1.4090103@amd.com> <4C234493.2050408@redhat.com> <4C28CBC5.80109@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Avi Kivity , Andre Przywara , "kvm@vger.kernel.org" To: Anthony Liguori Return-path: Received: from cantor2.suse.de ([195.135.220.15]:55595 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751029Ab0F1Q0D (ORCPT ); Mon, 28 Jun 2010 12:26:03 -0400 In-Reply-To: <4C28CBC5.80109@codemonkey.ws> Sender: kvm-owner@vger.kernel.org List-ID: Anthony Liguori wrote: > On 06/24/2010 06:42 AM, Avi Kivity wrote: >> On 06/24/2010 02:34 PM, Andre Przywara wrote: >>>> Non-anonymous memory doesn't work well with ksm and transparent >>>> hugepages. Is it possible to use anonymous memory rather than file >>>> backed? >>> >>> I'd prefer non-file backed, too. But that is how the current huge >>> pages implementation is done. We could use MAP_HUGETLB and declare >>> NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find >>> an easy way to detect the presence of the MAP_HUGETLB flag. If the >>> kernel does not support it, it seems that mmap silently ignores it >>> and uses 4KB pages instead. >> >> That sucks, unfortunately it is normal practice. However it is a >> soft failure, everything works just a bit slower. So it's probably >> acceptable. >> >>>>> To avoid this I'd like to see the pinning done from within QEMU. I >>>>> am not sure whether calling numactl via system() and friends is >>>>> OK, I'd prefer to run the syscalls directly (like in patch 3/3) >>>>> and pull the necessary options into the -numa pin,... command >>>>> line. We could mimic numactl's syntax here. >>>> >>>> Definitely not use system(), but IIRC numactl has a library interface? >>> Right, that is what I include in patch 3/3 and use. I got the >>> impression Anthony wanted to avoid reimplementing parts of numactl, >>> especially enabling the full flexibility of the command line >>> interface (like specifying nodes, policies and interleaving). >>> I want QEMU to use the library and pull the necessary options into >>> the -numa pin,... parsing, even if this means duplicating numactl >>> functionality. >>> >> >> I agree with that. It's a lot easier to use a single tool than to >> try to integrate things yourself, the unix tradition of grep | sort | >> uniq -c | sort -n notwithstanding. Especially when one of the tools >> is qemu. > > I could disagree more here. This is why we don't support CPU pinning > and instead provide PID information for each VCPU thread. > > The folks that want to use pinning are not notice users. They are not > going to be happy unless you can make full use of existing tools. > That means replicating all of numactl's functionality (which is not > what the current patches do) or enable numactl to be used with a guest. So how about some QMP plumbing that would allow numactl to create the VMs at defined ranges? So you'd basically get numactl --run-qemu -- qemu-kvm -blah -foo Alex