qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Wanlong Gao <gaowanlong@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH 5/5] memory: able to pin guest node memory to host node manually
Date: Tue, 28 May 2013 10:27:06 +0800	[thread overview]
Message-ID: <51A415FA.7050805@cn.fujitsu.com> (raw)
In-Reply-To: <1369298842-6295-5-git-send-email-gaowanlong@cn.fujitsu.com>

Any comments?


> Use mbind to pin guest numa node memory to host nodes manually.
> 
> If we are not able to pin memory to host node, we may meet the
> cross node memory access performance regression.
> 
> With this patch, we can add manual pinning host node like this:
> -m 1024 -numa node,cpus=0,nodeid=0,mem=512,pin=0 -numa node,nodeid=1,cpus=1,mem=512,pin=1
> 
> And, if PCI-passthrough is used, direct-attached-device uses DMA transfer
> between device and qemu process. All pages of the guest will be pinned by get_user_pages().
> 
> KVM_ASSIGN_PCI_DEVICE ioctl
>   kvm_vm_ioctl_assign_device()
>     =>kvm_assign_device()
>       => kvm_iommu_map_memslots()
>         => kvm_iommu_map_pages()
>            => kvm_pin_pages()
> 
> So, with direct-attached-device, all guest page's page count will be +1 and
> any page migration will not work. AutoNUMA won't too. And direction by libvirt is *ignored*.
> 
> Above all, we need manual pinning memory to host node to avoid
> such cross nodes memmory access performance regression.
> 
> Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> ---
>  exec.c                  | 21 +++++++++++++++++++++
>  include/sysemu/sysemu.h |  1 +
>  vl.c                    | 13 +++++++++++++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index aec65c5..fe929ef 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -36,6 +36,8 @@
>  #include "qemu/config-file.h"
>  #include "exec/memory.h"
>  #include "sysemu/dma.h"
> +#include "sysemu/sysemu.h"
> +#include "qemu/bitops.h"
>  #include "exec/address-spaces.h"
>  #if defined(CONFIG_USER_ONLY)
>  #include <qemu.h>
> @@ -1081,6 +1083,25 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
>              memory_try_enable_merging(new_block->host, size);
>          }
>      }
> +
> +    if (nb_numa_nodes > 0 && !strcmp(mr->name, "pc.ram")) {
> +        int i;
> +        uint64_t nodes_mem = 0;
> +        unsigned long *maskp = g_malloc0(sizeof(*maskp));
> +        for (i = 0; i < nb_numa_nodes; i++) {
> +            *maskp = 0;
> +            if (node_pin[i] != -1) {
> +                set_bit(node_pin[i], maskp);
> +                if (qemu_mbind(new_block->host + nodes_mem, node_mem[i],
> +                               QEMU_MPOL_BIND, maskp, MAX_NODES, 0)) {
> +                    perror("qemu_mbind");
> +                    exit(1);
> +                }
> +            }
> +            nodes_mem += node_mem[i];
> +        }
> +    }
> +
>      new_block->length = size;
>  
>      /* Keep the list sorted from biggest to smallest block.  */
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 2fb71af..ebf6580 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -131,6 +131,7 @@ extern QEMUClock *rtc_clock;
>  #define MAX_CPUMASK_BITS 255
>  extern int nb_numa_nodes;
>  extern uint64_t node_mem[MAX_NODES];
> +extern int node_pin[MAX_NODES];
>  extern unsigned long *node_cpumask[MAX_NODES];
>  
>  #define MAX_OPTION_ROMS 16
> diff --git a/vl.c b/vl.c
> index 5555b1d..3768002 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -253,6 +253,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order =
>  
>  int nb_numa_nodes;
>  uint64_t node_mem[MAX_NODES];
> +int node_pin[MAX_NODES];
>  unsigned long *node_cpumask[MAX_NODES];
>  
>  uint8_t qemu_uuid[16];
> @@ -1390,6 +1391,17 @@ static void numa_add(const char *optarg)
>              }
>              node_mem[nodenr] = sval;
>          }
> +
> +        if (get_param_value(option, 128, "pin", optarg) != 0) {
> +            int unsigned long long pin_node;
> +            if (parse_uint_full(option, &pin_node, 10) < 0) {
> +                fprintf(stderr, "qemu: Invalid pinning nodeid: %s\n", optarg);
> +                exit(1);
> +            } else {
> +                node_pin[nodenr] = pin_node;
> +            }
> +        }
> +
>          if (get_param_value(option, 128, "cpus", optarg) != 0) {
>              numa_node_parse_cpus(nodenr, option);
>          }
> @@ -2921,6 +2933,7 @@ int main(int argc, char **argv, char **envp)
>  
>      for (i = 0; i < MAX_NODES; i++) {
>          node_mem[i] = 0;
> +        node_pin[i] = -1;
>          node_cpumask[i] = bitmap_new(MAX_CPUMASK_BITS);
>      }
>  
> 

  parent reply	other threads:[~2013-05-28  2:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-23  8:47 [Qemu-devel] [PATCH 1/5] pci-assign: remove the duplicate function name in debug message Wanlong Gao
2013-05-23  8:47 ` [Qemu-devel] [PATCH 2/5] memory: check if the total numa memory size is equal to ram_size Wanlong Gao
2013-05-23  8:47 ` [Qemu-devel] [PATCH 3/5] memory: do not assign node_mem[] to 0 twice Wanlong Gao
2013-05-23  8:47 ` [Qemu-devel] [PATCH 4/5] Add qemu_mbind interface for pinning memory to host node Wanlong Gao
2013-05-23  8:47 ` [Qemu-devel] [PATCH 5/5] memory: able to pin guest node memory to host node manually Wanlong Gao
2013-05-24  7:10   ` Wanlong Gao
2013-05-27  2:57     ` Wanlong Gao
2013-05-28  2:27   ` Wanlong Gao [this message]
2013-05-30  9:57   ` Wanlong Gao
2013-05-30 18:22     ` Eduardo Habkost
2013-05-31  8:45       ` Wanlong Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A415FA.7050805@cn.fujitsu.com \
    --to=gaowanlong@cn.fujitsu.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).