Re: [Qemu-devel] [RFC PATCH 14/14] numa: add -numa node, memdev= option

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Igor Mammedov <imammedo@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: gaowanlong@cn.fujitsu.com, qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 14/14] numa: add -numa node, memdev= option
Date: Wed, 11 Dec 2013 16:36:26 +0100	[thread overview]
Message-ID: <20131211163626.3540983d@thinkpad> (raw)
In-Reply-To: <1386764361-15260-15-git-send-email-pbonzini@redhat.com>

On Wed, 11 Dec 2013 13:19:21 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> This option provides the infrastructure for binding guest NUMA nodes
> to host NUMA nodes.  For example:
> 
>  -object memory-ram,size=1024M,policy=membind,host-nodes=0,id=ram-node0 \
>  -numa node,nodeid=0,cpus=0,memdev=ram-node0 \
>  -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
>  -numa node,nodeid=1,cpus=1,memdev=ram-node1
I was thinking about a bit more radical change:
-object memory-ram,size=1024M,policy=membind,host-nodes=0,id=ram-node0
-device dimm,memdev=ram-node0,node=0
-object memory-ram,size=1024M,policy=membind,host-nodes=1,id=ram-node1
-device dimm,memdev=ram-node1,node=1

that would allow to avoid synthetic -numa option but would require conversion
of initial RAM to dimms. That would be more flexible for example allowing
bind several backends to one node (like: 1Gb_hugepage + 2Mb_hugepage ones)

> 
> The option replaces "-numa mem".
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/sysemu/sysemu.h |  2 ++
>  numa.c                  | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
>  qapi-schema.json        |  6 ++++-
>  3 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index e9da760..acfc0c7 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -12,6 +12,7 @@
>  #include "qemu/bitmap.h"
>  #include "qom/object.h"
>  #include "hw/boards.h"
> +#include "sysemu/hostmem.h"
>  
>  /* vl.c */
>  
> @@ -140,6 +141,7 @@ extern int nb_numa_nodes;
>  typedef struct node_info {
>      uint64_t node_mem;
>      DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
> +    HostMemoryBackend *node_memdev;
>  } NodeInfo;
>  extern NodeInfo numa_info[MAX_NODES];
>  void set_numa_nodes(void);
> diff --git a/numa.c b/numa.c
> index f903b9e..686dbfa 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -27,6 +27,8 @@
>  #include "qapi-visit.h"
>  #include "qapi/opts-visitor.h"
>  #include "qapi/dealloc-visitor.h"
> +#include "qapi/qmp/qerror.h"
> +
>  QemuOptsList qemu_numa_opts = {
>      .name = "numa",
>      .implied_opt_name = "type",
> @@ -34,10 +36,13 @@ QemuOptsList qemu_numa_opts = {
>      .desc = { { 0 } } /* validated with OptsVisitor */
>  };
>  
> +static int have_memdevs = -1;
> +
>  static int numa_node_parse(NumaNodeOptions *opts)
>  {
>      uint16_t nodenr;
>      uint16List *cpus = NULL;
> +    Error *local_err = NULL;
>  
>      if (opts->has_nodeid) {
>          nodenr = opts->nodeid;
> @@ -60,6 +65,19 @@ static int numa_node_parse(NumaNodeOptions *opts)
>          bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
>      }
>  
> +    if (opts->has_mem && opts->has_memdev) {
> +        fprintf(stderr, "qemu: cannot specify both mem= and memdev=\n");
> +        return -1;
> +    }
> +
> +    if (have_memdevs == -1) {
> +        have_memdevs = opts->has_memdev;
> +    }
> +    if (opts->has_memdev != have_memdevs) {
> +        fprintf(stderr, "qemu: memdev option must be specified for either "
> +                "all or no nodes\n");
> +    }
> +
>      if (opts->has_mem) {
>          int64_t mem_size;
>          char *endptr;
> @@ -70,7 +88,19 @@ static int numa_node_parse(NumaNodeOptions *opts)
>          }
>          numa_info[nodenr].node_mem = mem_size;
>      }
> +    if (opts->has_memdev) {
> +        Object *o;
> +        o = object_resolve_path_type(opts->memdev, TYPE_MEMORY_BACKEND, NULL);
> +        if (!o) {
> +            error_setg(&local_err, "memdev=%s is ambiguous", opts->memdev);
> +            qerror_report_err(local_err);
> +            return -1;
> +        }
>  
> +        object_ref(o);
> +        numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
> +        numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
> +    }
there wouldn't be any need for keeping numa mappings in separate structure,
if we use node property of dimm device.
acpi_build then could build SRAT directly enumerating present dimm devices. 

>      return 0;
>  }
>  
> @@ -188,12 +218,42 @@ void set_numa_modes(void)
>      }
>  }
>  
> +static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> +                                           const char *name,
> +                                           QEMUMachineInitArgs *args)
> +{
> +    uint64_t ram_size = args->ram_size;
> +
> +    memory_region_init_ram(mr, owner, name, ram_size);
> +    vmstate_register_ram_global(mr);
> +}
> +
>  void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>                                            const char *name,
>                                            QEMUMachineInitArgs *args)
>  {
>      uint64_t ram_size = args->ram_size;
> +    uint64_t addr = 0;
> +    int i;
>  
> -    memory_region_init_ram(mr, owner, name, ram_size);
> -    vmstate_register_ram_global(mr);
> +    if (nb_numa_nodes == 0 || !have_memdevs) {
> +        allocate_system_memory_nonnuma(mr, owner, name, args);
> +        return;
> +    }
> +
> +    memory_region_init(mr, owner, name, ram_size);
container might clip subregions due to presence of hole,
it's size should be ram_size + hole_size. 

> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        Error *local_err = NULL;
> +        uint64_t size = numa_info[i].node_mem;
> +        HostMemoryBackend *backend = numa_info[i].node_memdev;
> +        MemoryRegion *seg = host_memory_backend_get_memory(backend, &local_err);
> +        if (local_err) {
> +            qerror_report_err(local_err);
> +            exit(1);
> +        }
> +
> +        memory_region_add_subregion(mr, addr, seg);
> +        vmstate_register_ram_global(seg);
> +        addr += size;
> +    }
>  }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index d99e39d..e449316 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4256,7 +4256,10 @@
>  #
>  # @cpus: #optional VCPUs belong to this node
>  #
> -# @mem: #optional memory size of this node
> +# @memdev: #optional memory backend object.  If specified for one node,
> +#          it must be specified for all nodes.
> +#
> +# @mem: #optional memory size of this node; mutually exclusive with @memdev.
>  #
>  # Since: 2.0
>  ##
> @@ -4264,4 +4267,5 @@
>    'data': {
>     '*nodeid': 'uint16',
>     '*cpus':   ['uint16'],
> +   '*memdev': 'str',
>     '*mem':    'str' }}
> -- 
> 1.8.4.2
> 


-- 
Regards,
  Igor

next prev parent reply	other threads:[~2013-12-11 15:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-11 12:19 [Qemu-devel] [RFC PATCH 00/14] Common base for memory hotplug and NUMA policy work Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 01/14] NUMA: move numa related code to new file numa.c Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 02/14] NUMA: check if the total numa memory size is equal to ram_size Paolo Bonzini
2013-12-11 18:48   ` Eric Blake
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 03/14] NUMA: Add numa_info structure to contain numa nodes info Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 04/14] NUMA: convert -numa option to use OptsVisitor Paolo Bonzini
2013-12-11 18:51   ` Eric Blake
2013-12-11 21:35     ` Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 05/14] NUMA: expand MAX_NODES from 64 to 128 Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 06/14] qapi: add SIZE type parser to string_input_visitor Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 07/14] QemuOpts: introduce qemu_find_opts_singleton Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 08/14] vl: convert -m to QemuOpts Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 09/14] qom: fix leak for objects created with -object Paolo Bonzini
2013-12-12  8:18   ` Stefan Hajnoczi
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 10/14] qom: catch errors in object_property_add_child Paolo Bonzini
2013-12-11 12:44   ` Igor Mammedov
2013-12-12  8:18   ` Stefan Hajnoczi
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 11/14] pc: pass QEMUMachineInitArgs to pc_memory_init Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 12/14] numa: introduce memory_region_allocate_system_memory Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 13/14] add memdev backend infrastructure Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 14/14] numa: add -numa node, memdev= option Paolo Bonzini
2013-12-11 15:36   ` Igor Mammedov [this message]
2013-12-11 15:50     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131211163626.3540983d@thinkpad \
    --to=imammedo@redhat.com \
    --cc=gaowanlong@cn.fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).