qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com,
	Xiao Guangrong <xiaoguangrong.eric@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Marcel Apfelbaum <marcel@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM
Date: Tue, 20 Feb 2018 15:10:04 +0100	[thread overview]
Message-ID: <20180220151004.6fb932fb@redhat.com> (raw)
In-Reply-To: <20180217063135.21550-1-haozhong.zhang@intel.com>

On Sat, 17 Feb 2018 14:31:35 +0800
Haozhong Zhang <haozhong.zhang@intel.com> wrote:

> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> domain of a NVDIMM SPA range must match with corresponding entry in
> SRAT table.
> 
> The address ranges of vNVDIMM in QEMU are allocated from the
> hot-pluggable address space, which is entirely covered by one SRAT
> memory affinity structure. However, users can set the vNVDIMM
> proximity domain in NFIT SPA range structure by the 'node' property of
> '-device nvdimm' to a value different than the one in the above SRAT
> memory affinity structure.
> 
> In order to solve such proximity domain mismatch, this patch build one
> SRAT memory affinity structure for each NVDIMM device with the
> proximity domain used in NFIT. The remaining hot-pluggable address
> space is covered by one or multiple SRAT memory affinity structures
> with the proximity domain of the last node as before.
> 
> Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
If we consider hotpluggable system, correctly implemented OS should
be able pull proximity from Device::_PXM and override any value from SRAT.
Do we really have a problem here (anything that breaks if we would use _PXM)?
Maybe we should add _PXM object to nvdimm device nodes instead of massaging SRAT?

Beside of above policy decision, patch looks good.
If we decide that it's a way to go (it shouldn't hurt,
patch just adds more code to maintain), I'd like to see
tests added to tests/numa-test.c along with it to ensure
that it works as expected.

> ---
>  hw/acpi/nvdimm.c        | 15 +++++++++++++--
>  hw/i386/acpi-build.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
>  include/hw/mem/nvdimm.h | 11 +++++++++++
>  3 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index 59d6e4254c..dff0818e77 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -33,12 +33,23 @@
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/mem/nvdimm.h"
>  
> +static gint nvdimm_addr_sort(gconstpointer a, gconstpointer b)
> +{
> +    uint64_t addr0 = object_property_get_uint(OBJECT(NVDIMM(a)),
> +                                              PC_DIMM_ADDR_PROP, NULL);
> +    uint64_t addr1 = object_property_get_uint(OBJECT(NVDIMM(b)),
> +                                              PC_DIMM_ADDR_PROP, NULL);
> +
> +    return addr0 < addr1 ? -1 :
> +           addr0 > addr1 ?  1 : 0;
> +}
> +
>  static int nvdimm_device_list(Object *obj, void *opaque)
>  {
>      GSList **list = opaque;
>  
>      if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
> -        *list = g_slist_append(*list, DEVICE(obj));
> +        *list = g_slist_insert_sorted(*list, DEVICE(obj), nvdimm_addr_sort);
>      }
>  
>      object_child_foreach(obj, nvdimm_device_list, opaque);
> @@ -52,7 +63,7 @@ static int nvdimm_device_list(Object *obj, void *opaque)
>   * Note: it is the caller's responsibility to free the list to avoid
>   * memory leak.
>   */
> -static GSList *nvdimm_get_device_list(void)
> +GSList *nvdimm_get_device_list(void)
>  {
>      GSList *list = NULL;
>  
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index deb440f286..637ac3a8f0 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2323,6 +2323,46 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
>  #define HOLE_640K_START  (640 * 1024)
>  #define HOLE_640K_END   (1024 * 1024)
>  
> +static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> +                                           uint64_t len, int default_node)
> +{
> +    GSList *nvdimms = nvdimm_get_device_list();
> +    GSList *ent = nvdimms;
> +    NVDIMMDevice *dev;
> +    uint64_t end = base + len, addr, size;
> +    int node;
> +    AcpiSratMemoryAffinity *numamem;
> +
> +    while (base < end) {
> +        numamem = acpi_data_push(table_data, sizeof *numamem);
> +
> +        if (!ent) {
> +            build_srat_memory(numamem, base, end - base, default_node,
> +                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> +            break;
> +        }
> +
> +        dev = NVDIMM(ent->data);
> +        addr = object_property_get_uint(OBJECT(dev), PC_DIMM_ADDR_PROP, NULL);
> +        size = object_property_get_uint(OBJECT(dev), PC_DIMM_SIZE_PROP, NULL);
> +        node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, NULL);
> +
> +        if (base < addr) {
> +            build_srat_memory(numamem, base, addr - base, default_node,
> +                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> +            numamem = acpi_data_push(table_data, sizeof *numamem);
> +        }
> +        build_srat_memory(numamem, addr, size, node,
> +                          MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED |
> +                          MEM_AFFINITY_NON_VOLATILE);
> +
> +        base = addr + size;
> +        ent = ent->next;
> +    }
> +
> +    g_slist_free(nvdimms);
> +}
> +
>  static void
>  build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>  {
> @@ -2434,10 +2474,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>       * providing _PXM method if necessary.
>       */
>      if (hotplugabble_address_space_size) {
> -        numamem = acpi_data_push(table_data, sizeof *numamem);
> -        build_srat_memory(numamem, pcms->hotplug_memory.base,
> -                          hotplugabble_address_space_size, pcms->numa_nodes - 1,
> -                          MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
> +        build_srat_hotpluggable_memory(table_data, pcms->hotplug_memory.base,
> +                                       hotplugabble_address_space_size,
> +                                       pcms->numa_nodes - 1);
>      }
>  
>      build_header(linker, table_data,
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> index 7fd87c4e1c..ca9d6aa714 100644
> --- a/include/hw/mem/nvdimm.h
> +++ b/include/hw/mem/nvdimm.h
> @@ -144,4 +144,15 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>                         uint32_t ram_slots);
>  void nvdimm_plug(AcpiNVDIMMState *state);
>  void nvdimm_acpi_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev);
> +
> +/*
> + * Inquire NVDIMM devices and link them into the list which is
> + * returned to the caller and sorted in the ascending order of the
> + * base address of NVDIMM devices.
> + *
> + * Note: it is the caller's responsibility to free the list to avoid
> + * memory leak.
> + */
> +GSList *nvdimm_get_device_list(void);
> +
>  #endif

  reply	other threads:[~2018-02-20 14:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-17  6:31 [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM Haozhong Zhang
2018-02-20 14:10 ` Igor Mammedov [this message]
2018-02-21  1:17   ` Dan Williams
2018-02-21 13:55     ` Igor Mammedov
2018-02-21 14:51       ` Dan Williams
2018-02-26 14:08         ` Igor Mammedov
2018-02-22  1:40       ` Haozhong Zhang
2018-02-26 13:59         ` Igor Mammedov
2018-02-26 14:58           ` Haozhong Zhang
2018-02-24  1:17 ` no-reply
2018-02-24  1:42   ` Haozhong Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180220151004.6fb932fb@redhat.com \
    --to=imammedo@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=ehabkost@redhat.com \
    --cc=haozhong.zhang@intel.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=stefanha@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).