* [Qemu-devel] QEMU NUMA and memory allocation problem
@ 2013-05-16 9:50 Wanlong Gao
2013-05-16 12:02 ` Paolo Bonzini
2013-05-20 2:03 ` Wanlong Gao
0 siblings, 2 replies; 6+ messages in thread
From: Wanlong Gao @ 2013-05-16 9:50 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini, ehabkost
Hi,
We just met a problem of QEMU memory allocation.
Here is the description:
On my host, I have two nodes,
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 4010 MB
node 0 free: 3021 MB
node 1 cpus: 1 3
node 1 size: 4030 MB
node 1 free: 2881 MB
node distances:
node 0 1
0: 10 20
1: 20 10
I created a guest using the following XML:
...
<memory unit='KiB'>1048576</memory>
<currentMemory unit='KiB'>1048576</currentMemory>
<vcpu placement='static'>2</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='2'/>
<vcpupin vcpu='1' cpuset='3'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0-1'/>
</numatune>
<cpu>
<topology sockets='2' cores='1' threads='1'/>
<numa>
<cell cpus='0' memory='524288'/>
<cell cpus='1' memory='524288'/>
</numa>
</cpu>
...
As you can see, I assigned 1G memory to this guest, pined vcpu0 to the host CPU 2,
it's in host node0, pined vcpu1 to the host CPU 3 that is in host node1.
The guest also has two nodes, each node contains 512M memory.
Now, I started the guest, then printed the host numa state :
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 4010 MB
node 0 free: 2647 MB <=== freecell of node0
node 1 cpus: 1 3
node 1 size: 4030 MB
node 1 free: 2746 MB
node distances:
node 0 1
0: 10 20
1: 20 10
Then I tried to allocate memory from guest node0 using the following code:
> #include <memory.h>
> #include <numa.h>
>
> #define MEM (1024*1024*300)
>
> int main(void)
> {
> char *p = numa_alloc_onnode(MEM, 0);
> memset(p, 0, MEM);
> sleep(1000);
> numa_free(p, MEM);
> return 0;
> }
And printed the host numa state, it shows that this 300M memory is allocated from host node0,
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 4010 MB
node 0 free: 2345 MB <===== reduced ~300M
node 1 cpus: 1 3
node 1 size: 4030 MB
node 1 free: 2767 MB
node distances:
node 0 1
0: 10 20
1: 20 10
Then, I tried the same method to allocate 300M memory from guest node1, and printed the host
numa state:
# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2
node 0 size: 4010 MB
node 0 free: 2059 MB <=== reduced ~300M
node 1 cpus: 1 3
node 1 size: 4030 MB
node 1 free: 2767 MB <=== no change
node distances:
node 0 1
0: 10 20
1: 20 10
To see that this 300M memory is allocated from host node0 again, but not host node1 as
I expected.
We think that QEMU can't handled this numa memory allocation well, and it will cause the
cross node memory access performance regression.
Any thoughts? Or, am I missing something?
Thanks,
Wanlong Gao
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] QEMU NUMA and memory allocation problem 2013-05-16 9:50 [Qemu-devel] QEMU NUMA and memory allocation problem Wanlong Gao @ 2013-05-16 12:02 ` Paolo Bonzini 2013-05-17 7:47 ` Wanlong Gao 2013-05-20 2:03 ` Wanlong Gao 1 sibling, 1 reply; 6+ messages in thread From: Paolo Bonzini @ 2013-05-16 12:02 UTC (permalink / raw) To: gaowanlong; +Cc: qemu-devel, ehabkost Il 16/05/2013 11:50, Wanlong Gao ha scritto: > To see that this 300M memory is allocated from host node0 again, but not host node1 as > I expected. > > We think that QEMU can't handled this numa memory allocation well, and it will cause the > cross node memory access performance regression. > > Any thoughts? Or, am I missing something? Pinning memory to host NUMA nodes is not implemented. Something like AutoNUMA would be able to balance the memory the right way. Paolo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] QEMU NUMA and memory allocation problem 2013-05-16 12:02 ` Paolo Bonzini @ 2013-05-17 7:47 ` Wanlong Gao 2013-05-17 13:36 ` Eduardo Habkost 0 siblings, 1 reply; 6+ messages in thread From: Wanlong Gao @ 2013-05-17 7:47 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, Wanlong Gao, ehabkost On 05/16/2013 08:02 PM, Paolo Bonzini wrote: > Il 16/05/2013 11:50, Wanlong Gao ha scritto: >> To see that this 300M memory is allocated from host node0 again, but not host node1 as >> I expected. >> >> We think that QEMU can't handled this numa memory allocation well, and it will cause the >> cross node memory access performance regression. >> >> Any thoughts? Or, am I missing something? > > Pinning memory to host NUMA nodes is not implemented. Something like > AutoNUMA would be able to balance the memory the right way. Any plan to implement this? Or, handle this by QEMU is not necessary? I just enabled AutoNUMA on the host kernel, but the memory allocation numbers seems no change at all. Am I missing something? Thanks, Wanlong Gao > > Paolo > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] QEMU NUMA and memory allocation problem 2013-05-17 7:47 ` Wanlong Gao @ 2013-05-17 13:36 ` Eduardo Habkost 0 siblings, 0 replies; 6+ messages in thread From: Eduardo Habkost @ 2013-05-17 13:36 UTC (permalink / raw) To: Wanlong Gao; +Cc: Paolo Bonzini, qemu-devel On Fri, May 17, 2013 at 03:47:52PM +0800, Wanlong Gao wrote: > On 05/16/2013 08:02 PM, Paolo Bonzini wrote: > > Il 16/05/2013 11:50, Wanlong Gao ha scritto: > >> To see that this 300M memory is allocated from host node0 again, but not host node1 as > >> I expected. > >> > >> We think that QEMU can't handled this numa memory allocation well, and it will cause the > >> cross node memory access performance regression. > >> > >> Any thoughts? Or, am I missing something? > > > > Pinning memory to host NUMA nodes is not implemented. Something like > > AutoNUMA would be able to balance the memory the right way. > > Any plan to implement this? Or, handle this by QEMU is not necessary? > I just enabled AutoNUMA on the host kernel, but the memory allocation > numbers seems no change at all. Am I missing something? I had plans to implement a mechanism to allow external tools to implement manual pinning, but it is not one of my top priorities. It's the kind of mechanism that may be obsolete since birth, if we have AutoNUMA working and doing the right thing. -- Eduardo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] QEMU NUMA and memory allocation problem 2013-05-16 9:50 [Qemu-devel] QEMU NUMA and memory allocation problem Wanlong Gao 2013-05-16 12:02 ` Paolo Bonzini @ 2013-05-20 2:03 ` Wanlong Gao 2013-05-20 3:03 ` Wanpeng Li 1 sibling, 1 reply; 6+ messages in thread From: Wanlong Gao @ 2013-05-20 2:03 UTC (permalink / raw) Cc: aarcange, a.p.zijlstra, qemu-devel, linux-mm, mgorman, Paolo Bonzini, mingo, Wanlong Gao, ehabkost Adding CC AutoNUMA folks: Paolo said that: > Pinning memory to host NUMA nodes is not implemented. Something like > AutoNUMA would be able to balance the memory the right way. > > Paolo And Eduardo said that: > I had plans to implement a mechanism to allow external tools to > implement manual pinning, but it is not one of my top priorities. It's > the kind of mechanism that may be obsolete since birth, if we have > AutoNUMA working and doing the right thing. > > -- Eduardo But I didn't see any change when I enabled the AutoNUMA on my host. Can AutoNUMA folks teach me why? Or any plans to handle this problem in AutoNUMA? Thanks, Wanlong Gao > Hi, > > We just met a problem of QEMU memory allocation. > Here is the description: > > On my host, I have two nodes, > # numactl -H > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 4010 MB > node 0 free: 3021 MB > node 1 cpus: 1 3 > node 1 size: 4030 MB > node 1 free: 2881 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > > > I created a guest using the following XML: > > ... > <memory unit='KiB'>1048576</memory> > <currentMemory unit='KiB'>1048576</currentMemory> > <vcpu placement='static'>2</vcpu> > <cputune> > <vcpupin vcpu='0' cpuset='2'/> > <vcpupin vcpu='1' cpuset='3'/> > </cputune> > <numatune> > <memory mode='strict' nodeset='0-1'/> > </numatune> > <cpu> > <topology sockets='2' cores='1' threads='1'/> > <numa> > <cell cpus='0' memory='524288'/> > <cell cpus='1' memory='524288'/> > </numa> > </cpu> > ... > > As you can see, I assigned 1G memory to this guest, pined vcpu0 to the host CPU 2, > it's in host node0, pined vcpu1 to the host CPU 3 that is in host node1. > The guest also has two nodes, each node contains 512M memory. > > Now, I started the guest, then printed the host numa state : > # numactl -H > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 4010 MB > node 0 free: 2647 MB <=== freecell of node0 > node 1 cpus: 1 3 > node 1 size: 4030 MB > node 1 free: 2746 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > Then I tried to allocate memory from guest node0 using the following code: >> #include <memory.h> >> #include <numa.h> >> >> #define MEM (1024*1024*300) >> >> int main(void) >> { >> char *p = numa_alloc_onnode(MEM, 0); >> memset(p, 0, MEM); >> sleep(1000); >> numa_free(p, MEM); >> return 0; >> } > > And printed the host numa state, it shows that this 300M memory is allocated from host node0, > > # numactl -H > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 4010 MB > node 0 free: 2345 MB <===== reduced ~300M > node 1 cpus: 1 3 > node 1 size: 4030 MB > node 1 free: 2767 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > > Then, I tried the same method to allocate 300M memory from guest node1, and printed the host > numa state: > > # numactl -H > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 4010 MB > node 0 free: 2059 MB <=== reduced ~300M > node 1 cpus: 1 3 > node 1 size: 4030 MB > node 1 free: 2767 MB <=== no change > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > > To see that this 300M memory is allocated from host node0 again, but not host node1 as > I expected. > > We think that QEMU can't handled this numa memory allocation well, and it will cause the > cross node memory access performance regression. > > Any thoughts? Or, am I missing something? > > > Thanks, > Wanlong Gao > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] QEMU NUMA and memory allocation problem 2013-05-20 2:03 ` Wanlong Gao @ 2013-05-20 3:03 ` Wanpeng Li 0 siblings, 0 replies; 6+ messages in thread From: Wanpeng Li @ 2013-05-20 3:03 UTC (permalink / raw) To: Wanlong Gao Cc: aarcange, a.p.zijlstra, qemu-devel, linux-mm, mgorman, Paolo Bonzini, mingo, ehabkost On Mon, May 20, 2013 at 10:03:53AM +0800, Wanlong Gao wrote: >Adding CC AutoNUMA folks: > >Paolo said that: > >> Pinning memory to host NUMA nodes is not implemented. Something like >> AutoNUMA would be able to balance the memory the right way. >> >> Paolo > >And Eduardo said that: >> I had plans to implement a mechanism to allow external tools to >> implement manual pinning, but it is not one of my top priorities. It's >> the kind of mechanism that may be obsolete since birth, if we have >> AutoNUMA working and doing the right thing. >> >> -- Eduardo > Hi Wanlong, >But I didn't see any change when I enabled the AutoNUMA on my host. >Can AutoNUMA folks teach me why? >Or any plans to handle this problem in AutoNUMA? > AutoNUMA is not merged currently, the foundation(automatic NUMA balancing) that either the policy for schednuma or autonuma can be rebased on implemented by Mel has already merged. Regards, Wanpeng Li > >Thanks, >Wanlong Gao > > > >> Hi, >> >> We just met a problem of QEMU memory allocation. >> Here is the description: >> >> On my host, I have two nodes, >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 3021 MB >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2881 MB >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> >> >> I created a guest using the following XML: >> >> ... >> <memory unit='KiB'>1048576</memory> >> <currentMemory unit='KiB'>1048576</currentMemory> >> <vcpu placement='static'>2</vcpu> >> <cputune> >> <vcpupin vcpu='0' cpuset='2'/> >> <vcpupin vcpu='1' cpuset='3'/> >> </cputune> >> <numatune> >> <memory mode='strict' nodeset='0-1'/> >> </numatune> >> <cpu> >> <topology sockets='2' cores='1' threads='1'/> >> <numa> >> <cell cpus='0' memory='524288'/> >> <cell cpus='1' memory='524288'/> >> </numa> >> </cpu> >> ... >> >> As you can see, I assigned 1G memory to this guest, pined vcpu0 to the host CPU 2, >> it's in host node0, pined vcpu1 to the host CPU 3 that is in host node1. >> The guest also has two nodes, each node contains 512M memory. >> >> Now, I started the guest, then printed the host numa state : >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 2647 MB <=== freecell of node0 >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2746 MB >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> Then I tried to allocate memory from guest node0 using the following code: >>> #include <memory.h> >>> #include <numa.h> >>> >>> #define MEM (1024*1024*300) >>> >>> int main(void) >>> { >>> char *p = numa_alloc_onnode(MEM, 0); >>> memset(p, 0, MEM); >>> sleep(1000); >>> numa_free(p, MEM); >>> return 0; >>> } >> >> And printed the host numa state, it shows that this 300M memory is allocated from host node0, >> >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 2345 MB <===== reduced ~300M >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2767 MB >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> >> Then, I tried the same method to allocate 300M memory from guest node1, and printed the host >> numa state: >> >> # numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 2 >> node 0 size: 4010 MB >> node 0 free: 2059 MB <=== reduced ~300M >> node 1 cpus: 1 3 >> node 1 size: 4030 MB >> node 1 free: 2767 MB <=== no change >> node distances: >> node 0 1 >> 0: 10 20 >> 1: 20 10 >> >> >> To see that this 300M memory is allocated from host node0 again, but not host node1 as >> I expected. >> >> We think that QEMU can't handled this numa memory allocation well, and it will cause the >> cross node memory access performance regression. >> >> Any thoughts? Or, am I missing something? >> >> >> Thanks, >> Wanlong Gao >> >> > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@kvack.org. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-05-20 3:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-16 9:50 [Qemu-devel] QEMU NUMA and memory allocation problem Wanlong Gao 2013-05-16 12:02 ` Paolo Bonzini 2013-05-17 7:47 ` Wanlong Gao 2013-05-17 13:36 ` Eduardo Habkost 2013-05-20 2:03 ` Wanlong Gao 2013-05-20 3:03 ` Wanpeng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).