public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Theurer <habanero@linux.vnet.ibm.com>
To: Avi Kivity <avi@redhat.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>, kvm@vger.kernel.org
Subject: Re: How to determine the backing host physical memory for a given guest ?
Date: Thu, 10 May 2012 10:34:28 -0500	[thread overview]
Message-ID: <4FABE004.9010808@linux.vnet.ibm.com> (raw)
In-Reply-To: <4FAA752D.9020905@redhat.com>

On 05/09/2012 08:46 AM, Avi Kivity wrote:
> On 05/09/2012 04:05 PM, Chegu Vinod wrote:
>> Hello,
>>
>> On an 8 socket Westmere host I am attempting to run a single guest and
>> characterize the virtualization overhead for a system intensive
>> workload (AIM7-high_systime) as the size of the guest scales (10way/64G,
>> 20way/128G, ... 80way/512G).
>>
>> To do some comparisons between the native vs. guest runs. I have
>> been using "numactl" to control the cpu node&  memory node bindings for
>> the qemu instance.  For larger guest sizes I end up binding across multiple
>> localities. for e.g. a 40 way guest :
>>
>> numactl --cpunodebind=0,1,2,3  --membind=0,1,2,3  \
>> qemu-system-x86_64 -smp 40 -m 262144 \
>> <....>
>>
>> I understand that actual mappings from a guest virtual address to host physical
>> address could change.
>>
>> Is there a way to determine [at a given instant] which host's NUMA node is
>> providing the backing physical memory for the active guest's kernel and
>> also for the the apps actively running in the guest ?
>>
>> Guessing that there is a better way (some tool available?) than just
>> diff'ng the per node memory usage...from the before and after output of
>> "numactl --hardware" on the host.
>>
>
> Not sure if that's what you want, but there's Documentation/vm/pagemap.txt.
>

You can look at /proc/<pid>/numa_maps and see all the mappings for the 
qemu process.  There should be one really large mapping for the guest 
memory, and in that line a number of dirty pages list potentially for 
each NUMA node.  This will tell you how much from each node, but not 
specifically "which page is mapped where".

Keep in mind with the current numactl you are using, you will likely not 
get the benefits of NUMA enhancements found in the linux kernel from 
your guest (or host).  There are a couple reasons: (1) your guest does 
not have a NUMA topology defined (based on what I see from the qemu 
command above), so it will not do anything special based on the host 
topology.  Also, things that are broken down per-NUMA-node like some 
spin-locks and sched-domains are now system-wide/flat.  This is a big 
deal for scheduler and other things like kmem allocation.  With a single 
80way VM with no NUMA, you will likely have massive spin-lock contention 
on some workloads. (2) Once the VM does have NUMA toplogy (via qemu 
-numa), one still cannot manually set mempolicy for a portion of the VM 
memory that represents each NUMA node in the VM (or have this done 
automatically with something like autoNUMA).  Therefore, it's difficult 
to forcefully map each of the VM's node's memory to the corresponding 
host node.

There are a some things you can do to mitigate some of this.  Definitely 
define the VM to match the NUMA topology found on the host.  That will 
at least allow good scaling wrt locks and scheduler in the guest.  As 
for getting memory placement close (a page in VM node x actually resides 
in host node x), you have to rely on vcpu pinning + guest NUMA topology, 
combined with default mempolicy in the guest and host.  As pages are 
faulted in the guest, the hope is that the vcpu which did the faulting 
is running in the right node (guest and host), its guest OS mempolicy 
ensures this page is to be allocated in the guest local node, and that 
allocation cause a fault in qemu, which is -also- running on the -host- 
node X.  The vcpu pinning is critical to get qemu to fault that memory 
to the correct node.  Make sure you do not use numactl for any of this. 
  I would suggest using libvirt and define the vcpu-pinning and the numa 
topology in the XML.

-Andrew Theurer


  parent reply	other threads:[~2012-05-10 15:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-09 13:05 How to determine the backing host physical memory for a given guest ? Chegu Vinod
2012-05-09 13:46 ` Avi Kivity
2012-05-10  1:23   ` Chegu Vinod
2012-05-10 15:34   ` Andrew Theurer [this message]
2012-05-11  1:22     ` Chegu Vinod
2012-05-12  2:50       ` Chegu Vinod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FABE004.9010808@linux.vnet.ibm.com \
    --to=habanero@linux.vnet.ibm.com \
    --cc=avi@redhat.com \
    --cc=chegu_vinod@hp.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox