From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH 0/3][RFC] NUMA: add host side pinning Date: Mon, 28 Jun 2010 11:20:21 -0500 Message-ID: <4C28CBC5.80109@codemonkey.ws> References: <1277327377-29629-1-git-send-email-andre.przywara@amd.com> <4C2288DD.3020207@codemonkey.ws> <865764AB-4E51-4ED4-8832-AED6A237A9D3@suse.de> <4C233A6D.7030805@amd.com> <4C233DAB.60106@redhat.com> <4C2342D1.4090103@amd.com> <4C234493.2050408@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andre Przywara , Alexander Graf , "kvm@vger.kernel.org" To: Avi Kivity Return-path: Received: from mail-vw0-f46.google.com ([209.85.212.46]:44794 "EHLO mail-vw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754463Ab0F1QUX (ORCPT ); Mon, 28 Jun 2010 12:20:23 -0400 Received: by vws5 with SMTP id 5so209869vws.19 for ; Mon, 28 Jun 2010 09:20:23 -0700 (PDT) In-Reply-To: <4C234493.2050408@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 06/24/2010 06:42 AM, Avi Kivity wrote: > On 06/24/2010 02:34 PM, Andre Przywara wrote: >>> Non-anonymous memory doesn't work well with ksm and transparent >>> hugepages. Is it possible to use anonymous memory rather than file >>> backed? >> >> I'd prefer non-file backed, too. But that is how the current huge >> pages implementation is done. We could use MAP_HUGETLB and declare >> NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find an >> easy way to detect the presence of the MAP_HUGETLB flag. If the >> kernel does not support it, it seems that mmap silently ignores it >> and uses 4KB pages instead. > > That sucks, unfortunately it is normal practice. However it is a soft > failure, everything works just a bit slower. So it's probably > acceptable. > >>>> To avoid this I'd like to see the pinning done from within QEMU. I >>>> am not sure whether calling numactl via system() and friends is OK, >>>> I'd prefer to run the syscalls directly (like in patch 3/3) and >>>> pull the necessary options into the -numa pin,... command line. We >>>> could mimic numactl's syntax here. >>> >>> Definitely not use system(), but IIRC numactl has a library interface? >> Right, that is what I include in patch 3/3 and use. I got the >> impression Anthony wanted to avoid reimplementing parts of numactl, >> especially enabling the full flexibility of the command line >> interface (like specifying nodes, policies and interleaving). >> I want QEMU to use the library and pull the necessary options into >> the -numa pin,... parsing, even if this means duplicating numactl >> functionality. >> > > I agree with that. It's a lot easier to use a single tool than to try > to integrate things yourself, the unix tradition of grep | sort | uniq > -c | sort -n notwithstanding. Especially when one of the tools is qemu. I could disagree more here. This is why we don't support CPU pinning and instead provide PID information for each VCPU thread. The folks that want to use pinning are not notice users. They are not going to be happy unless you can make full use of existing tools. That means replicating all of numactl's functionality (which is not what the current patches do) or enable numactl to be used with a guest. Regards, Anthony Liguori