qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Cc: drjones@redhat.com, ehabkost@redhat.com, lersek@redhat.com,
	mtosatti@redhat.com, qemu-devel@nongnu.org,
	lcapitulino@redhat.com, bsd@redhat.com, anthony@codemonkey.ws,
	hutao@cn.fujitsu.com, y-goto@jp.fujitsu.com,
	peter.huangpeng@huawei.com, afaerber@suse.de
Subject: Re: [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes
Date: Fri, 06 Dec 2013 10:06:16 +0100	[thread overview]
Message-ID: <52A19388.4060201@redhat.com> (raw)
In-Reply-To: <1386143939-19142-1-git-send-email-gaowanlong@cn.fujitsu.com>

Il 04/12/2013 08:58, Wanlong Gao ha scritto:
> As you know, QEMU can't direct it's memory allocation now, this may cause
> guest cross node access performance regression.
> And, the worse thing is that if PCI-passthrough is used,
> direct-attached-device uses DMA transfer between device and qemu process.
> All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too.
> 
> So, we should set the guest nodes memory allocation policy before
> the pages are really mapped.
> 
> According to this patch set, we are able to set guest nodes memory policy
> like following:
> 
>  -numa node,nodeid=0,cpus=0, \
>  -numa mem,size=1024M,policy=membind,host-nodes=0-1 \
>  -numa node,nodeid=1,cpus=1 \
>  -numa mem,size=1024M,policy=interleave,host-nodes=1
> 
> This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format.
> 
> And add a QMP command "query-numa" to show numa info through
> this API.
> 
> And convert the "info numa" monitor command to use this
> QMP command "query-numa".
> 
> This version removes "set-mem-policy" qmp and hmp commands temporarily
> as Marcelo and Paolo suggested.
> 
> 
> The simple test is like following:
> =====================================================
> Before:
> # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096  -smp 2 -numa node,nodeid=0,cpus=0,mem=2048 -numa node,nodeid=1,cpus=1,mem=2048 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H
> [1] 13320
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 4653 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 4764 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 4317 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 876 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> 
> 
> 
> After:
> # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 4 -numa node,nodeid=0,cpus=0,cpus=2 -numa mem,size=2048M,policy=membind,host-nodes=0 -numa node,nodeid=0,cpus=1,cpus=3 -numa mem,size=2048M,policy=membind,host-nodes=1 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H
> [1] 10862
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 4718 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 4799 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> available: 2 nodes (0-1)
> node 0 cpus: 0 2
> node 0 size: 5111 MB
> node 0 free: 2544 MB
> node 1 cpus: 1 3
> node 1 size: 5120 MB
> node 1 free: 2725 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 
> ===================================================
> 
> 
> V1->V2:
>     change to use QemuOpts in numa options (Paolo)
>     handle Error in mpol parser (Paolo)
>     change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo)
> V2->V3:
>     also handle Error in cpus parser (5/10)
>     split out common parser from cpus and hostnode parser (Bandan 6/10)
> V3-V4:
>     rebase to request for comments
> V4->V5:
>     use OptVisitor and split -numa option (Paolo)
>      - s/set-mpol/set-mem-policy (Andreas)
>      - s/mem-policy/policy
>      - s/mem-hostnode/host-nodes
>     fix hmp command process after error (Luiz)
>     add qmp command query-numa and convert info numa to it (Luiz)
> V5->V6:
>     remove tabs in json file (Laszlo, Paolo)
>     add back "-numa node,mem=xxx" as legacy (Paolo)
>     change cpus and host-nodes to array (Laszlo, Eric)
>     change "nodeid" to "uint16"
>     add NumaMemPolicy enum type (Eric)
>     rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help
> V6-V7:
>     change UInt16 to uint16 (Laszlo)
>     fix a typo in adding qmp command set-mem-policy
> V7-V8:
>     rebase to current master with Laszlo's V2 of OptsVisitor patch set
>     fix an adding white space line error
> V8->V9:
>     rebase to current master
>     check if total numa memory size is equal to ram_size (Paolo)
>     add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo)
>     replace the use of numa_num_configured_nodes() (Andrew)
>     avoid abusing the fact i==nodeid (Andrew)
> V9->V10:
>     rebase to current master
>     remove libnuma (Andrew)
>     MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew)
>     use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew)
>     remove a useless clear_bit() operation (Andrew)
> V10->V11:
>     rebase to current master
>     fix "maxnode" argument of mbind(2)
> V11->V12:
>     rebase to current master
>     split patch 02/11 of V11 (Eduardo)
>     add some max value check (Eduardo)
>     split MAX_NODES change patch (Eduardo)
> V12->V13:
>     rebase to current master
>     thanks for Luiz's review (Luiz)
>     doc hmp command set-mem-policy (Luiz)
>     rename: NUMAInfo -> NUMANode (Luiz)
> V13->V14:
>     remove "set-mem-policy" qmp and hmp commands (Marcelo, Paolo)
> V14->V15:
>     rebase to the current master
> V15->V16:
>     rebase to current master
>     add more test log
> V16->V17:
>     use MemoryRegion to set policy instead of using "pc.ram" (Paolo)
> 
> Wanlong Gao (11):
>   NUMA: move numa related code to new file numa.c
>   NUMA: check if the total numa memory size is equal to ram_size
>   NUMA: Add numa_info structure to contain numa nodes info
>   NUMA: convert -numa option to use OptsVisitor
>   NUMA: introduce NumaMemOptions
>   NUMA: add "-numa mem," options
>   NUMA: expand MAX_NODES from 64 to 128
>   NUMA: parse guest numa nodes memory policy
>   NUMA: set guest numa nodes memory policy
>   NUMA: add qmp command query-numa
>   NUMA: convert hmp command info_numa to use qmp command query_numa
> 
>  Makefile.target         |   2 +-
>  cpus.c                  |  14 --
>  hmp.c                   |  57 +++++++
>  hmp.h                   |   1 +
>  hw/i386/pc.c            |  21 ++-
>  include/exec/memory.h   |  15 ++
>  include/sysemu/cpus.h   |   1 -
>  include/sysemu/sysemu.h |  18 ++-
>  monitor.c               |  21 +--
>  numa.c                  | 408 ++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json        | 112 +++++++++++++
>  qemu-options.hx         |   6 +-
>  qmp-commands.hx         |  49 ++++++
>  vl.c                    | 160 +++----------------
>  14 files changed, 698 insertions(+), 187 deletions(-)
>  create mode 100644 numa.c
> 

I think patches 1-4 and 7 are fine.  For the rest, I'd rather wait for
Igor's patches and try to integrate with Igor's memory hotplug patches.

Paolo

  parent reply	other threads:[~2013-12-06  9:23 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-04  7:58 [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 01/11] NUMA: move numa related code to new file numa.c Wanlong Gao
2013-12-10 13:06   ` Eduardo Habkost
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 02/11] NUMA: check if the total numa memory size is equal to ram_size Wanlong Gao
2013-12-10 13:15   ` Eduardo Habkost
2013-12-10 18:03     ` Paolo Bonzini
2013-12-10 19:01       ` Eduardo Habkost
2013-12-11 12:26         ` Daniel P. Berrange
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 03/11] NUMA: Add numa_info structure to contain numa nodes info Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 04/11] NUMA: convert -numa option to use OptsVisitor Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 05/11] NUMA: introduce NumaMemOptions Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 06/11] NUMA: add "-numa mem," options Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 07/11] NUMA: expand MAX_NODES from 64 to 128 Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 08/11] NUMA: parse guest numa nodes memory policy Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 09/11] NUMA: set " Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 10/11] NUMA: add qmp command query-numa Wanlong Gao
2013-12-04  7:58 ` [Qemu-devel] [PATCH V17 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa Wanlong Gao
2013-12-06  9:06 ` Paolo Bonzini [this message]
2013-12-06  9:31   ` [Qemu-devel] [PATCH V17 00/11] Add support for binding guest numa nodes to host numa nodes Wanlong Gao
2013-12-06  9:48     ` Paolo Bonzini
2013-12-09 18:16       ` Eduardo Habkost
2013-12-09 18:26         ` Paolo Bonzini
2013-12-06  9:06 ` Paolo Bonzini
2013-12-06 18:49   ` Marcelo Tosatti
2013-12-09 17:33     ` Paolo Bonzini
2013-12-09 18:10       ` Marcelo Tosatti
2013-12-09 18:26         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A19388.4060201@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=afaerber@suse.de \
    --cc=anthony@codemonkey.ws \
    --cc=bsd@redhat.com \
    --cc=drjones@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=gaowanlong@cn.fujitsu.com \
    --cc=hutao@cn.fujitsu.com \
    --cc=lcapitulino@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=y-goto@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).