From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44480) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ucuqs-0006b8-Pw for qemu-devel@nongnu.org; Thu, 16 May 2013 05:51:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ucuqr-0000At-Gb for qemu-devel@nongnu.org; Thu, 16 May 2013 05:51:58 -0400 Received: from [222.73.24.84] (port=15632 helo=song.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ucuqr-00009E-2r for qemu-devel@nongnu.org; Thu, 16 May 2013 05:51:57 -0400 Message-ID: <5194ABFD.8040200@cn.fujitsu.com> Date: Thu, 16 May 2013 17:50:53 +0800 From: Wanlong Gao MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 Subject: [Qemu-devel] QEMU NUMA and memory allocation problem Reply-To: gaowanlong@cn.fujitsu.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel Cc: Paolo Bonzini , ehabkost@redhat.com Hi, We just met a problem of QEMU memory allocation. Here is the description: On my host, I have two nodes, # numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 4010 MB node 0 free: 3021 MB node 1 cpus: 1 3 node 1 size: 4030 MB node 1 free: 2881 MB node distances: node 0 1 0: 10 20 1: 20 10 I created a guest using the following XML: ... 1048576 1048576 2 ... As you can see, I assigned 1G memory to this guest, pined vcpu0 to the host CPU 2, it's in host node0, pined vcpu1 to the host CPU 3 that is in host node1. The guest also has two nodes, each node contains 512M memory. Now, I started the guest, then printed the host numa state : # numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 4010 MB node 0 free: 2647 MB <=== freecell of node0 node 1 cpus: 1 3 node 1 size: 4030 MB node 1 free: 2746 MB node distances: node 0 1 0: 10 20 1: 20 10 Then I tried to allocate memory from guest node0 using the following code: > #include > #include > > #define MEM (1024*1024*300) > > int main(void) > { > char *p = numa_alloc_onnode(MEM, 0); > memset(p, 0, MEM); > sleep(1000); > numa_free(p, MEM); > return 0; > } And printed the host numa state, it shows that this 300M memory is allocated from host node0, # numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 4010 MB node 0 free: 2345 MB <===== reduced ~300M node 1 cpus: 1 3 node 1 size: 4030 MB node 1 free: 2767 MB node distances: node 0 1 0: 10 20 1: 20 10 Then, I tried the same method to allocate 300M memory from guest node1, and printed the host numa state: # numactl -H available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 4010 MB node 0 free: 2059 MB <=== reduced ~300M node 1 cpus: 1 3 node 1 size: 4030 MB node 1 free: 2767 MB <=== no change node distances: node 0 1 0: 10 20 1: 20 10 To see that this 300M memory is allocated from host node0 again, but not host node1 as I expected. We think that QEMU can't handled this numa memory allocation well, and it will cause the cross node memory access performance regression. Any thoughts? Or, am I missing something? Thanks, Wanlong Gao