Re: [Qemu-devel] [RFC PATCH 14/14] numa: add -numa node, memdev= option

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Igor Mammedov <imammedo@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: gaowanlong@cn.fujitsu.com, qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] [RFC PATCH 14/14] numa: add -numa node, memdev= option
Date: Wed, 11 Dec 2013 16:36:26 +0100	[thread overview]
Message-ID: <20131211163626.3540983d@thinkpad> (raw)
In-Reply-To: <1386764361-15260-15-git-send-email-pbonzini@redhat.com>

On Wed, 11 Dec 2013 13:19:21 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> This option provides the infrastructure for binding guest NUMA nodes
> to host NUMA nodes.  For example:
> 
>  -object memory-ram,size=1024M,policy=membind,host-nodes=0,id=ram-node0 \
>  -numa node,nodeid=0,cpus=0,memdev=ram-node0 \
>  -object memory-ram,size=1024M,policy=interleave,host-nodes=1-3,id=ram-node1 \
>  -numa node,nodeid=1,cpus=1,memdev=ram-node1
I was thinking about a bit more radical change:
-object memory-ram,size=1024M,policy=membind,host-nodes=0,id=ram-node0
-device dimm,memdev=ram-node0,node=0
-object memory-ram,size=1024M,policy=membind,host-nodes=1,id=ram-node1
-device dimm,memdev=ram-node1,node=1

that would allow to avoid synthetic -numa option but would require conversion
of initial RAM to dimms. That would be more flexible for example allowing
bind several backends to one node (like: 1Gb_hugepage + 2Mb_hugepage ones)

> 
> The option replaces "-numa mem".
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  include/sysemu/sysemu.h |  2 ++
>  numa.c                  | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
>  qapi-schema.json        |  6 ++++-
>  3 files changed, 69 insertions(+), 3 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index e9da760..acfc0c7 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -12,6 +12,7 @@
>  #include "qemu/bitmap.h"
>  #include "qom/object.h"
>  #include "hw/boards.h"
> +#include "sysemu/hostmem.h"
>  
>  /* vl.c */
>  
> @@ -140,6 +141,7 @@ extern int nb_numa_nodes;
>  typedef struct node_info {
>      uint64_t node_mem;
>      DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
> +    HostMemoryBackend *node_memdev;
>  } NodeInfo;
>  extern NodeInfo numa_info[MAX_NODES];
>  void set_numa_nodes(void);
> diff --git a/numa.c b/numa.c
> index f903b9e..686dbfa 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -27,6 +27,8 @@
>  #include "qapi-visit.h"
>  #include "qapi/opts-visitor.h"
>  #include "qapi/dealloc-visitor.h"
> +#include "qapi/qmp/qerror.h"
> +
>  QemuOptsList qemu_numa_opts = {
>      .name = "numa",
>      .implied_opt_name = "type",
> @@ -34,10 +36,13 @@ QemuOptsList qemu_numa_opts = {
>      .desc = { { 0 } } /* validated with OptsVisitor */
>  };
>  
> +static int have_memdevs = -1;
> +
>  static int numa_node_parse(NumaNodeOptions *opts)
>  {
>      uint16_t nodenr;
>      uint16List *cpus = NULL;
> +    Error *local_err = NULL;
>  
>      if (opts->has_nodeid) {
>          nodenr = opts->nodeid;
> @@ -60,6 +65,19 @@ static int numa_node_parse(NumaNodeOptions *opts)
>          bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1);
>      }
>  
> +    if (opts->has_mem && opts->has_memdev) {
> +        fprintf(stderr, "qemu: cannot specify both mem= and memdev=\n");
> +        return -1;
> +    }
> +
> +    if (have_memdevs == -1) {
> +        have_memdevs = opts->has_memdev;
> +    }
> +    if (opts->has_memdev != have_memdevs) {
> +        fprintf(stderr, "qemu: memdev option must be specified for either "
> +                "all or no nodes\n");
> +    }
> +
>      if (opts->has_mem) {
>          int64_t mem_size;
>          char *endptr;
> @@ -70,7 +88,19 @@ static int numa_node_parse(NumaNodeOptions *opts)
>          }
>          numa_info[nodenr].node_mem = mem_size;
>      }
> +    if (opts->has_memdev) {
> +        Object *o;
> +        o = object_resolve_path_type(opts->memdev, TYPE_MEMORY_BACKEND, NULL);
> +        if (!o) {
> +            error_setg(&local_err, "memdev=%s is ambiguous", opts->memdev);
> +            qerror_report_err(local_err);
> +            return -1;
> +        }
>  
> +        object_ref(o);
> +        numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
> +        numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
> +    }
there wouldn't be any need for keeping numa mappings in separate structure,
if we use node property of dimm device.
acpi_build then could build SRAT directly enumerating present dimm devices. 

>      return 0;
>  }
>  
> @@ -188,12 +218,42 @@ void set_numa_modes(void)
>      }
>  }
>  
> +static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner,
> +                                           const char *name,
> +                                           QEMUMachineInitArgs *args)
> +{
> +    uint64_t ram_size = args->ram_size;
> +
> +    memory_region_init_ram(mr, owner, name, ram_size);
> +    vmstate_register_ram_global(mr);
> +}
> +
>  void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
>                                            const char *name,
>                                            QEMUMachineInitArgs *args)
>  {
>      uint64_t ram_size = args->ram_size;
> +    uint64_t addr = 0;
> +    int i;
>  
> -    memory_region_init_ram(mr, owner, name, ram_size);
> -    vmstate_register_ram_global(mr);
> +    if (nb_numa_nodes == 0 || !have_memdevs) {
> +        allocate_system_memory_nonnuma(mr, owner, name, args);
> +        return;
> +    }
> +
> +    memory_region_init(mr, owner, name, ram_size);
container might clip subregions due to presence of hole,
it's size should be ram_size + hole_size. 

> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        Error *local_err = NULL;
> +        uint64_t size = numa_info[i].node_mem;
> +        HostMemoryBackend *backend = numa_info[i].node_memdev;
> +        MemoryRegion *seg = host_memory_backend_get_memory(backend, &local_err);
> +        if (local_err) {
> +            qerror_report_err(local_err);
> +            exit(1);
> +        }
> +
> +        memory_region_add_subregion(mr, addr, seg);
> +        vmstate_register_ram_global(seg);
> +        addr += size;
> +    }
>  }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index d99e39d..e449316 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4256,7 +4256,10 @@
>  #
>  # @cpus: #optional VCPUs belong to this node
>  #
> -# @mem: #optional memory size of this node
> +# @memdev: #optional memory backend object.  If specified for one node,
> +#          it must be specified for all nodes.
> +#
> +# @mem: #optional memory size of this node; mutually exclusive with @memdev.
>  #
>  # Since: 2.0
>  ##
> @@ -4264,4 +4267,5 @@
>    'data': {
>     '*nodeid': 'uint16',
>     '*cpus':   ['uint16'],
> +   '*memdev': 'str',
>     '*mem':    'str' }}
> -- 
> 1.8.4.2
> 


-- 
Regards,
  Igor

next prev parent reply	other threads:[~2013-12-11 15:36 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-11 12:19 [Qemu-devel] [RFC PATCH 00/14] Common base for memory hotplug and NUMA policy work Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 01/14] NUMA: move numa related code to new file numa.c Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 02/14] NUMA: check if the total numa memory size is equal to ram_size Paolo Bonzini
2013-12-11 18:48   ` Eric Blake
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 03/14] NUMA: Add numa_info structure to contain numa nodes info Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 04/14] NUMA: convert -numa option to use OptsVisitor Paolo Bonzini
2013-12-11 18:51   ` Eric Blake
2013-12-11 21:35     ` Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 05/14] NUMA: expand MAX_NODES from 64 to 128 Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 06/14] qapi: add SIZE type parser to string_input_visitor Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 07/14] QemuOpts: introduce qemu_find_opts_singleton Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 08/14] vl: convert -m to QemuOpts Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 09/14] qom: fix leak for objects created with -object Paolo Bonzini
2013-12-12  8:18   ` Stefan Hajnoczi
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 10/14] qom: catch errors in object_property_add_child Paolo Bonzini
2013-12-11 12:44   ` Igor Mammedov
2013-12-12  8:18   ` Stefan Hajnoczi
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 11/14] pc: pass QEMUMachineInitArgs to pc_memory_init Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 12/14] numa: introduce memory_region_allocate_system_memory Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 13/14] add memdev backend infrastructure Paolo Bonzini
2013-12-11 12:19 ` [Qemu-devel] [RFC PATCH 14/14] numa: add -numa node, memdev= option Paolo Bonzini
2013-12-11 15:36   ` Igor Mammedov [this message]
2013-12-11 15:50     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131211163626.3540983d@thinkpad \
    --to=imammedo@redhat.com \
    --cc=gaowanlong@cn.fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.