All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: <ankita@nvidia.com>
Cc: <jgg@nvidia.com>,  <alex.williamson@redhat.com>,
	 <clg@redhat.com>, <shannon.zhaosl@gmail.com>,
	 <peter.maydell@linaro.org>, <ani@anisinha.ca>,
	 <berrange@redhat.com>,  <eduardo@habkost.net>,
	<imammedo@redhat.com>,  <mst@redhat.com>,  <eblake@redhat.com>,
	<armbru@redhat.com>,  <david@redhat.com>,  <gshan@redhat.com>,
	<Jonathan.Cameron@huawei.com>,  <aniketa@nvidia.com>,
	 <cjia@nvidia.com>, <kwankhede@nvidia.com>,
	 <targupta@nvidia.com>,  <vsethi@nvidia.com>,
	<acurrid@nvidia.com>,  <dnigam@nvidia.com>,  <udhoke@nvidia.com>,
	<qemu-arm@nongnu.org>,  <qemu-devel@nongnu.org>
Subject: Re: [PATCH v4 1/2] qom: new object to associate device to numa node
Date: Thu, 30 Nov 2023 15:12:04 +0100	[thread overview]
Message-ID: <87r0k772kr.fsf@pond.sub.org> (raw)
In-Reply-To: <20231119130111.761-2-ankita@nvidia.com> (ankita@nvidia.com's message of "Sun, 19 Nov 2023 18:31:10 +0530")

<ankita@nvidia.com> writes:

> From: Ankit Agrawal <ankita@nvidia.com>
>
> NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
> partitioning of the GPU device resources (including device memory) into
> several (upto 8) isolated instances. Each of the partitioned memory needs
> a dedicated NUMA node to operate. The partitions are not fixed and they
> can be created/deleted at runtime.
>
> Unfortunately Linux OS does not provide a means to dynamically create/destroy
> NUMA nodes and such feature implementation is not expected to be trivial. The
> nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
> we utilize the Generic Initiator Affinity structures that allows association
> between nodes and devices. Multiple GI structures per BDF is possible,
> allowing creation of multiple nodes by exposing unique PXM in each of these
> structures.
>
> Introduce a new acpi-generic-initiator object to allow host admin provide the
> device and the corresponding NUMA nodes. Qemu maintain this association and
> use this object to build the requisite GI Affinity Structure.
>
> An admin can provide the range of nodes through a uint16 array host-nodes
> and link it to a device by providing its id. Currently, only PCI device is
> supported and an error is returned for acpi device. The following sample
> creates 8 nodes and link them to the PCI device dev0:
>
> -numa node,nodeid=2 \
> -numa node,nodeid=3 \
> -numa node,nodeid=4 \
> -numa node,nodeid=5 \
> -numa node,nodeid=6 \
> -numa node,nodeid=7 \
> -numa node,nodeid=8 \
> -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
>
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
>  hw/acpi/acpi-generic-initiator.c         | 84 ++++++++++++++++++++++++
>  hw/acpi/meson.build                      |  1 +
>  include/hw/acpi/acpi-generic-initiator.h | 30 +++++++++
>  qapi/qom.json                            | 18 +++++
>  4 files changed, 133 insertions(+)
>  create mode 100644 hw/acpi/acpi-generic-initiator.c
>  create mode 100644 include/hw/acpi/acpi-generic-initiator.h
>
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> new file mode 100644
> index 0000000000..5ea51cb81e
> --- /dev/null
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-builtin-visit.h"
> +#include "qapi/visitor.h"
> +#include "qom/object_interfaces.h"
> +#include "qom/object.h"
> +#include "hw/qdev-core.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/pci.h"
> +#include "hw/pci/pci_device.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/acpi-generic-initiator.h"

Several #include are superfluous.  This compiles for me:

  #include "qemu/osdep.h"
  #include "hw/acpi/acpi-generic-initiator.h"
  #include "hw/pci/pci_device.h"
  #include "qapi/error.h"
  #include "qapi/qapi-builtin-visit.h"
  #include "qapi/visitor.h"
  #include "qemu/error-report.h"
  #include "sysemu/numa.h"

Yes, the alphabetical order is intentional.

> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
> +                   ACPI_GENERIC_INITIATOR, OBJECT,
> +                   { TYPE_USER_CREATABLE },
> +                   { NULL })
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
> +
> +static void acpi_generic_initiator_init(Object *obj)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +    gi->device = NULL;
> +    gi->nodelist = NULL;
> +}
> +
> +static void acpi_generic_initiator_finalize(Object *obj)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> +    g_free(gi->device);
> +    qapi_free_uint16List(gi->nodelist);
> +}
> +
> +static void acpi_generic_initiator_set_pci_device(Object *obj, const char *val,
> +                                                  Error **errp)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> +    gi->device = g_strdup(val);

The property is named "pci-dev", but the C member is called @device.
Making developers remember this mapping is not nice.  Suggest to rename
to @pci_dev.

> +}
> +
> +static void acpi_generic_initiator_set_acpi_device(Object *obj, const char *val,
> +                                                   Error **errp)
> +{
> +    error_setg(errp, "Generic Initiator ACPI device not supported");
> +}

Let's add the property when it actually works.  More below at [*].

> +
> +static void
> +acpi_generic_initiator_set_host_nodes(Object *obj, Visitor *v, const char *name,
> +                                      void *opaque, Error **errp)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +    uint16List *l;
> +
> +    visit_type_uint16List(v, name, &(gi->nodelist), errp);
> +
> +    for (l = gi->nodelist; l; l = l->next) {
> +        if (l->value >= MAX_NODES) {
> +            error_setg(errp, "Invalid host-nodes value: %d", l->value);
> +            qapi_free_uint16List(gi->nodelist);
> +            return;
> +        }
> +    }

Why not store the nodes in a bitset, like
host_memory_backend_set_host_nodes() does?

> +}
> +
> +static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> +{
> +    object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP,
> +                                  NULL, acpi_generic_initiator_set_pci_device);
> +    object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP,
> +                                  NULL, acpi_generic_initiator_set_acpi_device);

[*] Drop this one.

> +    object_class_property_add(oc, ACPI_GENERIC_INITIATOR_HOSTNODE_PROP, "int",
> +        NULL,
> +        acpi_generic_initiator_set_host_nodes,
> +        NULL, NULL);
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index fc1b952379..2268589519 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -1,5 +1,6 @@
>  acpi_ss = ss.source_set()
>  acpi_ss.add(files(
> +  'acpi-generic-initiator.c',
>    'acpi_interface.c',
>    'aml-build.c',
>    'bios-linker-loader.c',
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> new file mode 100644
> index 0000000000..db3ed02c80
> --- /dev/null
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -0,0 +1,30 @@
> +#ifndef ACPI_GENERIC_INITIATOR_H
> +#define ACPI_GENERIC_INITIATOR_H
> +
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "qemu/uuid.h"
> +#include "hw/acpi/aml-build.h"
> +#include "qom/object.h"
> +#include "qom/object_interfaces.h"
> +
> +#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
> +
> +#define ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP "pci-dev"
> +#define ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP "acpi-dev"
> +#define ACPI_GENERIC_INITIATOR_HOSTNODE_PROP "host-nodes"

These three macros have exactly one use each.  Get rid of them, please.

> +
> +typedef struct AcpiGenericInitiator {
> +    /* private */
> +    Object parent;
> +
> +    /* public */
> +    char *device;
> +    uint16List *nodelist;
> +} AcpiGenericInitiator;
> +
> +typedef struct AcpiGenericInitiatorClass {
> +        ObjectClass parent_class;
> +} AcpiGenericInitiatorClass;
> +
> +#endif
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..f726f5ea41 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,22 @@
>  { 'struct': 'VfioUserServerProperties',
>    'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>  
> +##
> +# @AcpiGenericInitiatorProperties:
> +#
> +# Properties for acpi-generic-initiator objects.
> +#
> +# @pci-dev: PCI device ID to be associated with the node
> +#
> +# @acpi-dev: ACPI device ID to be associated with the node

[*] Drop this one.

> +#
> +# @host-nodes: numa node list
> +#
> +# Since: 8.2

9.0

> +##
> +{ 'struct': 'AcpiGenericInitiatorProperties',
> +  'data': { '*pci-dev': 'str', '*acpi-dev': 'str', 'host-nodes': ['uint16'] } }

Long line.  Better:

     'data': { '*pci-dev': 'str',
               '*acpi-dev': 'str',
               'host-nodes': ['uint16'] } }

> +
>  ##
>  # @RngProperties:
>  #
> @@ -911,6 +927,7 @@
>  ##
>  { 'enum': 'ObjectType',
>    'data': [
> +    'acpi-generic-initiator',
>      'authz-list',
>      'authz-listfile',
>      'authz-pam',
> @@ -981,6 +998,7 @@
>              'id': 'str' },
>    'discriminator': 'qom-type',
>    'data': {
> +      'acpi-generic-initiator':     'AcpiGenericInitiatorProperties',
>        'authz-list':                 'AuthZListProperties',
>        'authz-listfile':             'AuthZListFileProperties',
>        'authz-pam':                  'AuthZPAMProperties',


  parent reply	other threads:[~2023-11-30 14:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 13:01 [PATCH v4 0/2] acpi: report numa nodes for device memory using GI ankita
2023-11-19 13:01 ` [PATCH v4 1/2] qom: new object to associate device to numa node ankita
2023-11-27 22:57   ` Alex Williamson
2023-11-30 14:12   ` Markus Armbruster [this message]
2023-11-19 13:01 ` [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2023-11-27 22:57   ` Alex Williamson
2023-11-30 14:12   ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0k772kr.fsf@pond.sub.org \
    --to=armbru@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=acurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=clg@redhat.com \
    --cc=david@redhat.com \
    --cc=dnigam@nvidia.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kwankhede@nvidia.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shannon.zhaosl@gmail.com \
    --cc=targupta@nvidia.com \
    --cc=udhoke@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.