From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44377) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vorcn-0000Gv-0a for qemu-devel@nongnu.org; Fri, 06 Dec 2013 04:23:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vorcg-0002vo-Vm for qemu-devel@nongnu.org; Fri, 06 Dec 2013 04:23:04 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49530) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vorcg-0002vi-Kk for qemu-devel@nongnu.org; Fri, 06 Dec 2013 04:22:58 -0500 Message-ID: <52A19388.4060201@redhat.com> Date: Fri, 06 Dec 2013 10:06:16 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1386143939-19142-1-git-send-email-gaowanlong@cn.fujitsu.com> In-Reply-To: <1386143939-19142-1-git-send-email-gaowanlong@cn.fujitsu.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wanlong Gao Cc: drjones@redhat.com, ehabkost@redhat.com, lersek@redhat.com, mtosatti@redhat.com, qemu-devel@nongnu.org, lcapitulino@redhat.com, bsd@redhat.com, anthony@codemonkey.ws, hutao@cn.fujitsu.com, y-goto@jp.fujitsu.com, peter.huangpeng@huawei.com, afaerber@suse.de Il 04/12/2013 08:58, Wanlong Gao ha scritto: > As you know, QEMU can't direct it's memory allocation now, this may cause > guest cross node access performance regression. > And, the worse thing is that if PCI-passthrough is used, > direct-attached-device uses DMA transfer between device and qemu process. > All pages of the guest will be pinned by get_user_pages(). > > KVM_ASSIGN_PCI_DEVICE ioctl > kvm_vm_ioctl_assign_device() > =>kvm_assign_device() > => kvm_iommu_map_memslots() > => kvm_iommu_map_pages() > => kvm_pin_pages() > > So, with direct-attached-device, all guest page's page count will be +1 and > any page migration will not work. AutoNUMA won't too. > > So, we should set the guest nodes memory allocation policy before > the pages are really mapped. > > According to this patch set, we are able to set guest nodes memory policy > like following: > > -numa node,nodeid=0,cpus=0, \ > -numa mem,size=1024M,policy=membind,host-nodes=0-1 \ > -numa node,nodeid=1,cpus=1 \ > -numa mem,size=1024M,policy=interleave,host-nodes=1 > > This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format. > > And add a QMP command "query-numa" to show numa info through > this API. > > And convert the "info numa" monitor command to use this > QMP command "query-numa". > > This version removes "set-mem-policy" qmp and hmp commands temporarily > as Marcelo and Paolo suggested. > > > The simple test is like following: > ===================================================== > Before: > # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 2 -numa node,nodeid=0,cpus=0,mem=2048 -numa node,nodeid=1,cpus=1,mem=2048 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H > [1] 13320 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 4653 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 4764 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 4317 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 876 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > > > After: > # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 4 -numa node,nodeid=0,cpus=0,cpus=2 -numa mem,size=2048M,policy=membind,host-nodes=0 -numa node,nodeid=0,cpus=1,cpus=3 -numa mem,size=2048M,policy=membind,host-nodes=1 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H > [1] 10862 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 4718 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 4799 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > available: 2 nodes (0-1) > node 0 cpus: 0 2 > node 0 size: 5111 MB > node 0 free: 2544 MB > node 1 cpus: 1 3 > node 1 size: 5120 MB > node 1 free: 2725 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > =================================================== > > > V1->V2: > change to use QemuOpts in numa options (Paolo) > handle Error in mpol parser (Paolo) > change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo) > V2->V3: > also handle Error in cpus parser (5/10) > split out common parser from cpus and hostnode parser (Bandan 6/10) > V3-V4: > rebase to request for comments > V4->V5: > use OptVisitor and split -numa option (Paolo) > - s/set-mpol/set-mem-policy (Andreas) > - s/mem-policy/policy > - s/mem-hostnode/host-nodes > fix hmp command process after error (Luiz) > add qmp command query-numa and convert info numa to it (Luiz) > V5->V6: > remove tabs in json file (Laszlo, Paolo) > add back "-numa node,mem=xxx" as legacy (Paolo) > change cpus and host-nodes to array (Laszlo, Eric) > change "nodeid" to "uint16" > add NumaMemPolicy enum type (Eric) > rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help > V6-V7: > change UInt16 to uint16 (Laszlo) > fix a typo in adding qmp command set-mem-policy > V7-V8: > rebase to current master with Laszlo's V2 of OptsVisitor patch set > fix an adding white space line error > V8->V9: > rebase to current master > check if total numa memory size is equal to ram_size (Paolo) > add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo) > replace the use of numa_num_configured_nodes() (Andrew) > avoid abusing the fact i==nodeid (Andrew) > V9->V10: > rebase to current master > remove libnuma (Andrew) > MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew) > use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew) > remove a useless clear_bit() operation (Andrew) > V10->V11: > rebase to current master > fix "maxnode" argument of mbind(2) > V11->V12: > rebase to current master > split patch 02/11 of V11 (Eduardo) > add some max value check (Eduardo) > split MAX_NODES change patch (Eduardo) > V12->V13: > rebase to current master > thanks for Luiz's review (Luiz) > doc hmp command set-mem-policy (Luiz) > rename: NUMAInfo -> NUMANode (Luiz) > V13->V14: > remove "set-mem-policy" qmp and hmp commands (Marcelo, Paolo) > V14->V15: > rebase to the current master > V15->V16: > rebase to current master > add more test log > V16->V17: > use MemoryRegion to set policy instead of using "pc.ram" (Paolo) > > Wanlong Gao (11): > NUMA: move numa related code to new file numa.c > NUMA: check if the total numa memory size is equal to ram_size > NUMA: Add numa_info structure to contain numa nodes info > NUMA: convert -numa option to use OptsVisitor > NUMA: introduce NumaMemOptions > NUMA: add "-numa mem," options > NUMA: expand MAX_NODES from 64 to 128 > NUMA: parse guest numa nodes memory policy > NUMA: set guest numa nodes memory policy > NUMA: add qmp command query-numa > NUMA: convert hmp command info_numa to use qmp command query_numa > > Makefile.target | 2 +- > cpus.c | 14 -- > hmp.c | 57 +++++++ > hmp.h | 1 + > hw/i386/pc.c | 21 ++- > include/exec/memory.h | 15 ++ > include/sysemu/cpus.h | 1 - > include/sysemu/sysemu.h | 18 ++- > monitor.c | 21 +-- > numa.c | 408 ++++++++++++++++++++++++++++++++++++++++++++++++ > qapi-schema.json | 112 +++++++++++++ > qemu-options.hx | 6 +- > qmp-commands.hx | 49 ++++++ > vl.c | 160 +++---------------- > 14 files changed, 698 insertions(+), 187 deletions(-) > create mode 100644 numa.c > I think patches 1-4 and 7 are fine. For the rest, I'd rather wait for Igor's patches and try to integrate with Igor's memory hotplug patches. Paolo