From: Wanpeng Li <liwanp@linux.vnet.ibm.com>
To: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Cc: aarcange@redhat.com, a.p.zijlstra@chello.nl,
qemu-devel <qemu-devel@nongnu.org>, linux-mm <linux-mm@kvack.org>,
mgorman@suse.de, Paolo Bonzini <pbonzini@redhat.com>,
mingo@kernel.org, ehabkost@redhat.com
Subject: Re: [Qemu-devel] QEMU NUMA and memory allocation problem
Date: Mon, 20 May 2013 11:03:43 +0800 [thread overview]
Message-ID: <20130520030343.GA22424@hacker.(null)> (raw)
In-Reply-To: <51998489.804@cn.fujitsu.com>
On Mon, May 20, 2013 at 10:03:53AM +0800, Wanlong Gao wrote:
>Adding CC AutoNUMA folks:
>
>Paolo said that:
>
>> Pinning memory to host NUMA nodes is not implemented. Something like
>> AutoNUMA would be able to balance the memory the right way.
>>
>> Paolo
>
>And Eduardo said that:
>> I had plans to implement a mechanism to allow external tools to
>> implement manual pinning, but it is not one of my top priorities. It's
>> the kind of mechanism that may be obsolete since birth, if we have
>> AutoNUMA working and doing the right thing.
>>
>> -- Eduardo
>
Hi Wanlong,
>But I didn't see any change when I enabled the AutoNUMA on my host.
>Can AutoNUMA folks teach me why?
>Or any plans to handle this problem in AutoNUMA?
>
AutoNUMA is not merged currently, the foundation(automatic NUMA
balancing) that either the policy for schednuma or autonuma can be
rebased on implemented by Mel has already merged.
Regards,
Wanpeng Li
>
>Thanks,
>Wanlong Gao
>
>
>
>> Hi,
>>
>> We just met a problem of QEMU memory allocation.
>> Here is the description:
>>
>> On my host, I have two nodes,
>> # numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 2
>> node 0 size: 4010 MB
>> node 0 free: 3021 MB
>> node 1 cpus: 1 3
>> node 1 size: 4030 MB
>> node 1 free: 2881 MB
>> node distances:
>> node 0 1
>> 0: 10 20
>> 1: 20 10
>>
>>
>>
>> I created a guest using the following XML:
>>
>> ...
>> <memory unit='KiB'>1048576</memory>
>> <currentMemory unit='KiB'>1048576</currentMemory>
>> <vcpu placement='static'>2</vcpu>
>> <cputune>
>> <vcpupin vcpu='0' cpuset='2'/>
>> <vcpupin vcpu='1' cpuset='3'/>
>> </cputune>
>> <numatune>
>> <memory mode='strict' nodeset='0-1'/>
>> </numatune>
>> <cpu>
>> <topology sockets='2' cores='1' threads='1'/>
>> <numa>
>> <cell cpus='0' memory='524288'/>
>> <cell cpus='1' memory='524288'/>
>> </numa>
>> </cpu>
>> ...
>>
>> As you can see, I assigned 1G memory to this guest, pined vcpu0 to the host CPU 2,
>> it's in host node0, pined vcpu1 to the host CPU 3 that is in host node1.
>> The guest also has two nodes, each node contains 512M memory.
>>
>> Now, I started the guest, then printed the host numa state :
>> # numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 2
>> node 0 size: 4010 MB
>> node 0 free: 2647 MB <=== freecell of node0
>> node 1 cpus: 1 3
>> node 1 size: 4030 MB
>> node 1 free: 2746 MB
>> node distances:
>> node 0 1
>> 0: 10 20
>> 1: 20 10
>>
>> Then I tried to allocate memory from guest node0 using the following code:
>>> #include <memory.h>
>>> #include <numa.h>
>>>
>>> #define MEM (1024*1024*300)
>>>
>>> int main(void)
>>> {
>>> char *p = numa_alloc_onnode(MEM, 0);
>>> memset(p, 0, MEM);
>>> sleep(1000);
>>> numa_free(p, MEM);
>>> return 0;
>>> }
>>
>> And printed the host numa state, it shows that this 300M memory is allocated from host node0,
>>
>> # numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 2
>> node 0 size: 4010 MB
>> node 0 free: 2345 MB <===== reduced ~300M
>> node 1 cpus: 1 3
>> node 1 size: 4030 MB
>> node 1 free: 2767 MB
>> node distances:
>> node 0 1
>> 0: 10 20
>> 1: 20 10
>>
>>
>> Then, I tried the same method to allocate 300M memory from guest node1, and printed the host
>> numa state:
>>
>> # numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 2
>> node 0 size: 4010 MB
>> node 0 free: 2059 MB <=== reduced ~300M
>> node 1 cpus: 1 3
>> node 1 size: 4030 MB
>> node 1 free: 2767 MB <=== no change
>> node distances:
>> node 0 1
>> 0: 10 20
>> 1: 20 10
>>
>>
>> To see that this 300M memory is allocated from host node0 again, but not host node1 as
>> I expected.
>>
>> We think that QEMU can't handled this numa memory allocation well, and it will cause the
>> cross node memory access performance regression.
>>
>> Any thoughts? Or, am I missing something?
>>
>>
>> Thanks,
>> Wanlong Gao
>>
>>
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org. For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2013-05-20 3:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-16 9:50 [Qemu-devel] QEMU NUMA and memory allocation problem Wanlong Gao
2013-05-16 12:02 ` Paolo Bonzini
2013-05-17 7:47 ` Wanlong Gao
2013-05-17 13:36 ` Eduardo Habkost
2013-05-20 2:03 ` Wanlong Gao
2013-05-20 3:03 ` Wanpeng Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='20130520030343.GA22424@hacker.(null)' \
--to=liwanp@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=ehabkost@redhat.com \
--cc=gaowanlong@cn.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).