From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46114) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UeGOC-00067T-Cd for qemu-devel@nongnu.org; Sun, 19 May 2013 23:03:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UeGOB-0002E8-2L for qemu-devel@nongnu.org; Sun, 19 May 2013 23:03:56 -0400 Received: from e28smtp09.in.ibm.com ([122.248.162.9]:43187) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UeGOA-0002CX-HD for qemu-devel@nongnu.org; Sun, 19 May 2013 23:03:54 -0400 Received: from /spool/local by e28smtp09.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 20 May 2013 08:29:58 +0530 Received: from d28relay03.in.ibm.com (d28relay03.in.ibm.com [9.184.220.60]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id 60F5C125802D for ; Mon, 20 May 2013 08:35:42 +0530 (IST) Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r4K33c5l63504500 for ; Mon, 20 May 2013 08:33:38 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r4K33jMM003221 for ; Mon, 20 May 2013 03:03:45 GMT Date: Mon, 20 May 2013 11:03:43 +0800 From: Wanpeng Li Message-ID: <20130520030343.GA22424@hacker.(null)> References: <5194ABFD.8040200@cn.fujitsu.com> <51998489.804@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51998489.804@cn.fujitsu.com> Subject: Re: [Qemu-devel] QEMU NUMA and memory allocation problem Reply-To: Wanpeng Li List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wanlong Gao Cc: aarcange@redhat.com, a.p.zijlstra@chello.nl, qemu-devel , linux-mm , mgorman@suse.de, Paolo Bonzini , mingo@kernel.org, ehabkost@redhat.com On Mon, May 20, 2013 at 10:03:53AM +0800, Wanlong Gao wrote: >Adding CC AutoNUMA folks: > >Paolo said that: > >> Pinning memory to host NUMA nodes is not implemented. Something like >> AutoNUMA would be able to balance the memory the right way. >> >> Paolo > >And Eduardo said that: >> I had plans to implement a mechanism to allow external tools to >> implement manual pinning, but it is not one of my top priorities. It's >> the kind of mechanism that may be obsolete since birth, if we have >> AutoNUMA working and doing the right thing. >> >> -- Eduardo > Hi Wanlong, >But I didn't see any change when I enabled the AutoNUMA on my host. >Can AutoNUMA folks teach me why? >Or any plans to handle this problem in AutoNUMA? > AutoNUMA is not merged currently, the foundation(automatic NUMA balancing) that either the policy for schednuma or autonuma can be rebased on implemented by Mel has already merged. Regards, Wanpeng Li > >Thanks, >Wanlong Gao > > > >> Hi, >> >> We just met a problem of QEMU memory allocation. >> Here is the description: >> >> On my host, I have two nodes, >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 3021 MB >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2881 MB >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> >> >> I created a guest using the following XML: >> >> ... >> 1048576 >> 1048576 >> 2 >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ... >> >> As you can see, I assigned 1G memory to this guest, pined vcpu0 to the host CPU 2, >> it's in host node0, pined vcpu1 to the host CPU 3 that is in host node1. >> The guest also has two nodes, each node contains 512M memory. >> >> Now, I started the guest, then printed the host numa state : >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 2647 MB <=== freecell of node0 >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2746 MB >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> Then I tried to allocate memory from guest node0 using the following code: >>> #include >>> #include >>> >>> #define MEM (1024*1024*300) >>> >>> int main(void) >>> { >>> char *p = numa_alloc_onnode(MEM, 0); >>> memset(p, 0, MEM); >>> sleep(1000); >>> numa_free(p, MEM); >>> return 0; >>> } >> >> And printed the host numa state, it shows that this 300M memory is allocated from host node0, >> >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 2345 MB <===== reduced ~300M >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2767 MB >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> >> Then, I tried the same method to allocate 300M memory from guest node1, and printed the host >> numa state: >> >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 2059 MB <=== reduced ~300M >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2767 MB <=== no change >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> >> To see that this 300M memory is allocated from host node0 again, but not host node1 as >> I expected. >> >> We think that QEMU can't handled this numa memory allocation well, and it will cause the >> cross node memory access performance regression. >> >> Any thoughts? Or, am I missing something? >> >> >> Thanks, >> Wanlong Gao >> >> > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: email@kvack.org