qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, Eduardo Habkost <ehabkost@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Andrew Jones <drjones@redhat.com>, Eric Blake <eblake@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Shannon Zhao <zhaoshenglong@huawei.com>,
	qemu-arm@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
Date: Tue, 28 Mar 2017 15:19:20 +1100	[thread overview]
Message-ID: <20170328041920.GC21068@umbus.fritz.box> (raw)
In-Reply-To: <1490189568-167621-6-git-send-email-imammedo@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 13096 bytes --]

On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> Originally CPU threads were by default assigned in
> round-robin fashion. However it was causing issues in
> guest since CPU threads from the same socket/core could
> be placed on different NUMA nodes.
> Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> fixed it by grouping threads within a socket on the same node
> introducing cpu_index_to_socket_id() callback and commit
> 20bb648d (spapr: Fix default NUMA node allocation for threads)
> reused callback to fix similar issues for SPAPR machine
> even though socket doesn't make much sense there.
> 
> As result QEMU ended up having 3 default distribution rules
> used by 3 targets /virt-arm, spapr, pc/.
> 
> In effort of moving NUMA mapping for CPUs into possible_cpus,
> generalize default mapping in numa.c by making boards decide
> on default mapping and let them explicitly tell generic
> numa code to which node a CPU thread belongs to by replacing
> cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> which provides default node_id assigned by board to specified
> cpu_index.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> Patch only moves source of default mapping to possible_cpus[]
> and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
> bitmaps. It's up to follow up patches to replace bitmaps
> with possible_cpus[] internally.
> ---
>  include/hw/boards.h   |  8 ++++++--
>  include/sysemu/numa.h |  2 +-
>  hw/arm/virt.c         | 19 +++++++++++++++++--
>  hw/i386/pc.c          | 22 ++++++++++++++++------
>  hw/ppc/spapr.c        | 27 ++++++++++++++++++++-------
>  numa.c                | 15 +++++++++------
>  vl.c                  |  2 +-
>  7 files changed, 70 insertions(+), 25 deletions(-)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 269d0ba..1dd0fde 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -74,7 +74,10 @@ typedef struct {
>   *    of HotplugHandler object, which handles hotplug operation
>   *    for a given @dev. It may return NULL if @dev doesn't require
>   *    any actions to be performed by hotplug handler.
> - * @cpu_index_to_socket_id:
> + * @cpu_index_to_instance_props:
> + *    used to provide @cpu_index to socket/core/thread number mapping, allowing
> + *    legacy code to perform maping from cpu_index to topology properties
> + *    Returns: tuple of socket/core/thread ids given cpu_index belongs to.
>   *    used to provide @cpu_index to socket number mapping, allowing
>   *    a machine to group CPU threads belonging to the same socket/package
>   *    Returns: socket number given cpu_index belongs to.
> @@ -138,7 +141,8 @@ struct MachineClass {
>  
>      HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>                                             DeviceState *dev);
> -    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
> +    CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
> +                                                         unsigned cpu_index);
>      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
>  };
>  
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 8f09dcf..46ea6c7 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -24,7 +24,7 @@ typedef struct node_info {
>  } NodeInfo;
>  
>  extern NodeInfo numa_info[MAX_NODES];
> -void parse_numa_opts(MachineClass *mc);
> +void parse_numa_opts(MachineState *ms);
>  void numa_post_machine_init(void);
>  void query_numa_node_mem(uint64_t node_mem[]);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 0cbcbc1..8748d25 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static CpuInstanceProperties
> +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;
> +}
> +

It seems a bit weird to have a machine specific hook to pull the
property information when one way or another it's coming from the
possible_cpus table, which is already constructed by a machine
specific hook.  Could we add a range or list of cpu_index values to
each possible_cpus entry instead, and have a generic lookup of the
right entry based on that?


>  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>  {
>      int n;
> @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[n].props.has_thread_id = true;
>          ms->possible_cpus->cpus[n].props.thread_id = n;
>  
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */

I'm a little confused by this comment, since I don't see any board
code altering has_node_id.

> +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
> +        }
>      }
>      return ms->possible_cpus;
>  }
> @@ -1596,6 +1610,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
>      mc->minimum_page_bits = 12;
>      mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
>  }
>  
>  static const TypeInfo virt_machine_info = {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index d24388e..7031100 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2245,12 +2245,14 @@ static void pc_machine_reset(void)
>      }
>  }
>  
> -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> +static CpuInstanceProperties
> +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> -    X86CPUTopoInfo topo;
> -    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> -                          &topo);
> -    return topo.pkg_id;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;

Since the pc and arm version of this are basically identical, I wonder
if that should actually be the default implementation.  If we need it
at all.

>  }
>  
>  static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> @@ -2282,6 +2284,14 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
>          ms->possible_cpus->cpus[i].props.has_thread_id = true;
>          ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
> +
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            ms->possible_cpus->cpus[i].props.node_id =
> +                topo.pkg_id % nb_numa_nodes;
> +        }
>      }
>      return ms->possible_cpus;
>  }
> @@ -2324,7 +2334,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>      pcmc->acpi_data_size = 0x20000 + 0x8000;
>      pcmc->save_tsc_khz = true;
>      mc->get_hotplug_handler = pc_get_hotpug_handler;
> -    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
> +    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
>      mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
>      mc->has_hotpluggable_cpus = true;
>      mc->default_boot_order = "cad";
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6ee566d..9dcbbcc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2921,11 +2921,18 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
>      return NULL;
>  }
>  
> -static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
> +static CpuInstanceProperties
> +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
>  {
> -    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
> -     * socket means much for the paravirtualized PAPR platform) */
> -    return cpu_index / smp_threads / smp_cores;
> +    CPUArchId *core_slot;
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    int core_id = cpu_index / smp_threads * smp_threads;

I don't think you need this.  AIUI the purpose of
spapr_find_cpu_slot() is that it already finds the right CPU slot from
a cpu_index, so you can just pass the cpu_index directly.

> +
> +    /* make sure possible_cpu are intialized */
> +    mc->possible_cpu_arch_ids(machine);
> +    core_slot = spapr_find_cpu_slot(machine, core_id, NULL);
> +    assert(core_slot);
> +    return core_slot->props;
>  }
>  
>  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
> @@ -2952,8 +2959,14 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
>          machine->possible_cpus->cpus[i].arch_id = core_id;
>          machine->possible_cpus->cpus[i].props.has_core_id = true;
>          machine->possible_cpus->cpus[i].props.core_id = core_id;
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            machine->possible_cpus->cpus[i].props.node_id =
> +                core_id / smp_threads / smp_cores % nb_numa_nodes;
> +        }
>      }
>      return machine->possible_cpus;
>  }
> @@ -3076,7 +3089,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      hc->pre_plug = spapr_machine_device_pre_plug;
>      hc->plug = spapr_machine_device_plug;
>      hc->unplug = spapr_machine_device_unplug;
> -    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> +    mc->cpu_index_to_instance_props = spapr_cpu_index_to_props;
>      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>      hc->unplug_request = spapr_machine_device_unplug_request;
>  
> diff --git a/numa.c b/numa.c
> index e01cb54..b6e71bc 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
>      g_free(seen_cpus);
>  }
>  
> -void parse_numa_opts(MachineClass *mc)
> +void parse_numa_opts(MachineState *ms)
>  {
>      int i;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>      for (i = 0; i < MAX_NODES; i++) {
>          numa_info[i].node_cpu = bitmap_new(max_cpus);
> @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
>           * rule grouping VCPUs by socket so that VCPUs from the same socket
>           * would be on the same node.
>           */
> +        if (!mc->cpu_index_to_instance_props) {
> +            error_report("default CPUs to NUMA node mapping isn't supported");
> +            exit(1);
> +        }
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
> -                unsigned node_id = i % nb_numa_nodes;
> -                if (mc->cpu_index_to_socket_id) {
> -                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
> -                }
> +                CpuInstanceProperties props;
> +                props = mc->cpu_index_to_instance_props(ms, i);
>  
> -                set_bit(i, numa_info[node_id].node_cpu);
> +                set_bit(i, numa_info[props.node_id].node_cpu);
>              }
>          }
>  
> diff --git a/vl.c b/vl.c
> index 0b4ed52..5ffb9c3 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4498,7 +4498,7 @@ int main(int argc, char **argv, char **envp)
>      default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
>      default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
>  
> -    parse_numa_opts(machine_class);
> +    parse_numa_opts(current_machine);
>  
>      if (qemu_opts_foreach(qemu_find_opts("mon"),
>                            mon_init_func, NULL, NULL)) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2017-03-28  4:24 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
2017-03-27  0:31   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function Igor Mammedov
2017-04-25 14:09   ` Andrew Jones
2017-04-25 14:39     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info Igor Mammedov
2017-04-25 14:28   ` Andrew Jones
2017-04-25 14:36     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus Igor Mammedov
2017-04-25 14:33   ` Andrew Jones
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
2017-03-23  6:10   ` Bharata B Rao
2017-03-23  8:48     ` Igor Mammedov
2017-03-28  4:19   ` David Gibson [this message]
2017-03-28 10:53     ` Igor Mammedov
2017-03-29  2:24       ` David Gibson
2017-03-29 11:48         ` Igor Mammedov
2017-04-20 14:29     ` Igor Mammedov
2017-04-25 14:48   ` Andrew Jones
2017-04-25 15:07     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core Igor Mammedov
2017-03-28  4:23   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU Igor Mammedov
2017-04-12 21:02   ` Eduardo Habkost
2017-04-19 11:14     ` Igor Mammedov
2017-04-26 12:21       ` Eduardo Habkost
2017-04-27 13:14         ` Igor Mammedov
2017-04-27 16:32           ` Eduardo Habkost
2017-04-27 17:25             ` Igor Mammedov
2017-04-27 17:32               ` Eduardo Habkost
2017-05-02  4:27             ` David Gibson
2017-05-02  8:28               ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: " Igor Mammedov
2017-04-25 17:16   ` Andrew Jones
2017-04-26 10:47     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
2017-03-28  4:44   ` David Gibson
2017-04-12 21:15   ` Eduardo Habkost
2017-04-19  9:52     ` Igor Mammedov
2017-04-26 11:04       ` Eduardo Habkost
2017-04-13 13:58   ` Eduardo Habkost
2017-04-19  9:31     ` Igor Mammedov
2017-04-26 11:02       ` Eduardo Habkost
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Igor Mammedov
2017-03-28  4:46   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 13/23] spapr: " Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: " Igor Mammedov
2017-04-25 17:06   ` Andrew Jones
2017-04-26 10:54     ` Igor Mammedov
2017-04-26 11:27       ` Andrew Jones
2017-04-27 13:24         ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output Igor Mammedov
2017-03-23 13:19   ` Eric Blake
2017-03-24 12:20     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 16/23] tests: numa: add case for QMP command query-cpus Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu() Igor Mammedov
2017-03-28  4:54   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init() Igor Mammedov
2017-03-28  4:55   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 19/23] machine: call machine init from wrapper Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check Igor Mammedov
2017-03-28  5:13   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used Igor Mammedov
2017-03-28  5:13   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
2017-03-23 13:23   ` Eric Blake
2017-03-24 13:29     ` Igor Mammedov
2017-03-28  5:16   ` David Gibson
2017-03-28 11:09     ` Igor Mammedov
2017-03-29  2:27       ` David Gibson
2017-03-29 12:08         ` Igor Mammedov
2017-04-03  4:40           ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 23/23] tests: check -numa node, cpu=props_list usecase Igor Mammedov
2017-04-12 20:18 ` [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170328041920.GC21068@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=drjones@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=zhaoshenglong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).