qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] vfio/nvgpu: Add vfio pci variant module for grace hopper
@ 2023-11-07 19:00 ankita
  2023-11-07 19:00 ` [PATCH v3 1/2] qom: new object to associate device to numa node ankita
  2023-11-07 19:00 ` [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
  0 siblings, 2 replies; 9+ messages in thread
From: ankita @ 2023-11-07 19:00 UTC (permalink / raw)
  To: ankita, jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell,
	ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
	gshan, Jonathan.Cameron
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

From: Ankit Agrawal <ankita@nvidia.com>

There are upcoming devices which allow CPU to cache coherently access
their memory. It is sensible to expose such memory as NUMA nodes separate
from the sysmem node to the OS. The ACPI spec provides a scheme in SRAT
called Generic Initiator Affinity Structure [1] to allow an association
between a Proximity Domain (PXM) and a Generic Initiator (GI) (e.g.
heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines).

While a single node per device may cover several use cases, it is however
insufficient for a full utilization of the NVIDIA GPUs MIG
(Mult-Instance GPUs) [2] feature. The feature allows partitioning of the
GPU device resources (including device memory) into several (upto 8)
isolated instances. Each of the partitioned memory requires a dedicated NUMA
node to operate. The partitions are not fixed and they can be created/deleted
at runtime.

Linux OS does not provide a means to dynamically create/destroy NUMA nodes
and such feature implementation is expected to be non-trivial. The nodes
that OS discovers at the boot time while parsing SRAT remains fixed. So we
utilize the GI Affinity structures that allows association between nodes
and devices. Multiple GI structures per device/BDF is possible, allowing
creation of multiple nodes in the VM by exposing unique PXM in each of these
structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object that allows an
association of a set of nodes with a device. During SRAT creation, all such
objected are identified and used to add the GI Affinity Structures.

The admin will create a range of 8 nodes and associate that with the device
using the acpi-generic-initiator object. While a configuration of less than
8 nodes per device is allowed, such configuration will prevent utilization of
the feature to the fullest. This setting is applicable to all the Grace+Hopper
systems. The following is an example of the Qemu command line arguments to
create 8 nodes and link them to the device 'dev0':

-numa node,nodeid=2 \
-numa node,nodeid=3 \
-numa node,nodeid=4 \
-numa node,nodeid=5 \
-numa node,nodeid=6 \
-numa node,nodeid=7 \
-numa node,nodeid=8 \
-numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,device=dev0,numalist=2:3:4:5:6:7:8:9 \

The performance benefits can be realized by providing the NUMA node distances
appropriately (through libvirt tags or Qemu params). The admin can get the
distance among nodes in hardware using `numactl -H`.

This series goes along with the vfio-pci variant driver [3] under review.

Applied over v8.1.2.

[1] ACPI Spec 6.5, Section 5.2.16.6
[2] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
[3] https://lore.kernel.org/all/20231015163047.20391-1-ankita@nvidia.com/

Link for v2:
https://lore.kernel.org/all/20231007201740.30335-1-ankita@nvidia.com/

v2 -> v3
- changed param to accept a ':' delimited list of numa nodes, instead
of a range.
- Removed nvidia-acpi-generic-initiator object.
- Addressed miscellaneous comments in v2.

v1 -> v2
- Removed dependency on sysfs to communicate the feature with variant module.
- Use GI Affinity SRAT structure instead of Memory Affinity.
- No DSDT entries needed to communicate the PXM for the device. SRAT GI
structure is used instead.
- New objects introduced to establish link between device and nodes.

Ankit Agrawal (2):
  qom: new object to associate device to numa node
  hw/acpi: Implement the SRAT GI affinity structure

 hw/acpi/acpi-generic-initiator.c         | 159 +++++++++++++++++++++++
 hw/acpi/meson.build                      |   1 +
 hw/arm/virt-acpi-build.c                 |   3 +
 include/hw/acpi/acpi-generic-initiator.h |  50 +++++++
 qapi/qom.json                            |  16 +++
 5 files changed, 229 insertions(+)
 create mode 100644 hw/acpi/acpi-generic-initiator.c
 create mode 100644 include/hw/acpi/acpi-generic-initiator.h

-- 
2.17.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] qom: new object to associate device to numa node
  2023-11-07 19:00 [PATCH v3 0/2] vfio/nvgpu: Add vfio pci variant module for grace hopper ankita
@ 2023-11-07 19:00 ` ankita
  2023-11-15 13:59   ` Markus Armbruster
  2023-11-07 19:00 ` [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
  1 sibling, 1 reply; 9+ messages in thread
From: ankita @ 2023-11-07 19:00 UTC (permalink / raw)
  To: ankita, jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell,
	ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
	gshan, Jonathan.Cameron
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

From: Ankit Agrawal <ankita@nvidia.com>

NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.

Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator Affinity structures that allows association
between nodes and devices. Multiple GI structures per BDF is possible,
allowing creation of multiple nodes by exposing unique PXM in each of these
structures.

Introduce a new acpi-generic-initiator object to allow host admin provide the
device and the corresponding NUMA nodes. Qemu maintain this association and
use this object to build the requisite GI Affinity Structure.

An admin can provide the range of nodes using a ':' delimited numalist and
link it to a device by providing its id. The node ids are extracted from
numalist and stores as a uint16List. The following sample creates 8 nodes
and link them to the device dev0:

-numa node,nodeid=2 \
-numa node,nodeid=3 \
-numa node,nodeid=4 \
-numa node,nodeid=5 \
-numa node,nodeid=6 \
-numa node,nodeid=7 \
-numa node,nodeid=8 \
-numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,device=dev0,numalist=2:3:4:5:6:7:8:9 \

[1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 hw/acpi/acpi-generic-initiator.c         | 80 ++++++++++++++++++++++++
 hw/acpi/meson.build                      |  1 +
 include/hw/acpi/acpi-generic-initiator.h | 29 +++++++++
 qapi/qom.json                            | 16 +++++
 4 files changed, 126 insertions(+)
 create mode 100644 hw/acpi/acpi-generic-initiator.c
 create mode 100644 include/hw/acpi/acpi-generic-initiator.h

diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
new file mode 100644
index 0000000000..0699c878e2
--- /dev/null
+++ b/hw/acpi/acpi-generic-initiator.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qom/object_interfaces.h"
+#include "qom/object.h"
+#include "hw/qdev-core.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/pci.h"
+#include "hw/pci/pci_device.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/acpi-generic-initiator.h"
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
+                   ACPI_GENERIC_INITIATOR, OBJECT,
+                   { TYPE_USER_CREATABLE },
+                   { NULL })
+
+OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
+
+static void acpi_generic_initiator_init(Object *obj)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+    gi->device = NULL;
+    gi->nodelist = NULL;
+}
+
+static void acpi_generic_initiator_finalize(Object *obj)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+    g_free(gi->device);
+    qapi_free_uint16List(gi->nodelist);
+}
+
+static void acpi_generic_initiator_set_device(Object *obj, const char *val,
+                                              Error **errp)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+    gi->device = g_strdup(val);
+}
+
+static void acpi_generic_initiator_set_nodelist(Object *obj, const char *val,
+                                            Error **errp)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+    char *value = g_strdup(val);
+    uint16_t node;
+    uint16List **tail = &(gi->nodelist);
+    char *nodestr = value ? strtok(value, ":") : NULL;
+
+    while (nodestr) {
+        if (sscanf(nodestr, "%hu", &node) != 1) {
+            error_setg(errp, "failed to read node-id");
+            return;
+        }
+
+        if (node >= MAX_NODES) {
+            error_setg(errp, "invalid node-id");
+            return;
+        }
+
+        QAPI_LIST_APPEND(tail, node);
+        nodestr = strtok(NULL, ":");
+    }
+}
+
+static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
+{
+    object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_DEVICE_PROP, NULL,
+                                  acpi_generic_initiator_set_device);
+    object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_NODELIST_PROP,
+                                  NULL, acpi_generic_initiator_set_nodelist);
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fc1b952379..2268589519 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -1,5 +1,6 @@
 acpi_ss = ss.source_set()
 acpi_ss.add(files(
+  'acpi-generic-initiator.c',
   'acpi_interface.c',
   'aml-build.c',
   'bios-linker-loader.c',
diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
new file mode 100644
index 0000000000..bb127b2541
--- /dev/null
+++ b/include/hw/acpi/acpi-generic-initiator.h
@@ -0,0 +1,29 @@
+#ifndef ACPI_GENERIC_INITIATOR_H
+#define ACPI_GENERIC_INITIATOR_H
+
+#include "hw/mem/pc-dimm.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/aml-build.h"
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
+
+#define ACPI_GENERIC_INITIATOR_DEVICE_PROP "device"
+#define ACPI_GENERIC_INITIATOR_NODELIST_PROP "nodelist"
+
+typedef struct AcpiGenericInitiator {
+    /* private */
+    Object parent;
+
+    /* public */
+    char *device;
+    uint16List *nodelist;
+} AcpiGenericInitiator;
+
+typedef struct AcpiGenericInitiatorClass {
+        ObjectClass parent_class;
+} AcpiGenericInitiatorClass;
+
+#endif
diff --git a/qapi/qom.json b/qapi/qom.json
index fa3e88c8e6..66d2bffdcc 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -779,6 +779,20 @@
 { 'struct': 'VfioUserServerProperties',
   'data': { 'socket': 'SocketAddress', 'device': 'str' } }
 
+##
+# @AcpiGenericInitiatorProperties:
+#
+# Properties for acpi-generic-initiator objects.
+#
+# @device: the ID of the device to be associated with the node
+#
+# @nodelist: delimited numa node list
+#
+# Since: 8.2
+##
+{ 'struct': 'AcpiGenericInitiatorProperties',
+  'data': { 'device': 'str', 'nodelist': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -896,6 +910,7 @@
 ##
 { 'enum': 'ObjectType',
   'data': [
+    'acpi-generic-initiator',
     'authz-list',
     'authz-listfile',
     'authz-pam',
@@ -966,6 +981,7 @@
             'id': 'str' },
   'discriminator': 'qom-type',
   'data': {
+      'acpi-generic-initiator':     'AcpiGenericInitiatorProperties',
       'authz-list':                 'AuthZListProperties',
       'authz-listfile':             'AuthZListFileProperties',
       'authz-pam':                  'AuthZPAMProperties',
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2023-11-07 19:00 [PATCH v3 0/2] vfio/nvgpu: Add vfio pci variant module for grace hopper ankita
  2023-11-07 19:00 ` [PATCH v3 1/2] qom: new object to associate device to numa node ankita
@ 2023-11-07 19:00 ` ankita
  2023-11-07 21:33   ` Alex Williamson
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: ankita @ 2023-11-07 19:00 UTC (permalink / raw)
  To: ankita, jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell,
	ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
	gshan, Jonathan.Cameron
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

From: Ankit Agrawal <ankita@nvidia.com>

ACPI spec provides a scheme to associate "Generic Initiators" [1]
(e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines GPUs) with Proximity Domains. This is
achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
node for each unique PXM ID encountered. Qemu currently do not implement
these structures while building SRAT.

Add GI structures while building VM ACPI SRAT. The association between
devices and nodes are stored using acpi-generic-initiator object. Lookup
presence of all such objects and use them to build these structures.

The structure needs a PCI device handle [2] that consists of the device BDF.
The vfio-pci device corresponding to the acpi-generic-initiator object is
located to determine the BDF.

[1] ACPI Spec 6.5, Section 5.2.16.6
[2] ACPI Spec 6.5, Table 5.66

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 hw/acpi/acpi-generic-initiator.c         | 79 ++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c                 |  3 +
 include/hw/acpi/acpi-generic-initiator.h | 21 +++++++
 3 files changed, 103 insertions(+)

diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
index 0699c878e2..6d0a8fd818 100644
--- a/hw/acpi/acpi-generic-initiator.c
+++ b/hw/acpi/acpi-generic-initiator.c
@@ -78,3 +78,82 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
     object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_NODELIST_PROP,
                                   NULL, acpi_generic_initiator_set_nodelist);
 }
+
+static int acpi_generic_initiator_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
+        *list = g_slist_append(*list, ACPI_GENERIC_INITIATOR(obj));
+    }
+
+    object_child_foreach(obj, acpi_generic_initiator_list, opaque);
+    return 0;
+}
+
+/*
+ * Identify Generic Initiator objects and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+static GSList *acpi_generic_initiator_get_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(object_get_root(), acpi_generic_initiator_list, &list);
+    return list;
+}
+
+/*
+ * ACPI spec, Revision 6.5
+ * 5.2.16.6 Generic Initiator Affinity Structure
+ */
+static
+void build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
+                                               PCIDeviceHandle *handle)
+{
+    uint8_t index;
+
+    build_append_int_noprefix(table_data, 5, 1);     /* Type */
+    build_append_int_noprefix(table_data, 32, 1);    /* Length */
+    build_append_int_noprefix(table_data, 0, 1);     /* Reserved */
+    build_append_int_noprefix(table_data, 1, 1);     /* Device Handle Type */
+    build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
+    build_append_int_noprefix(table_data, handle->segment, 2);
+    build_append_int_noprefix(table_data, handle->bdf, 2);
+
+    /* Reserved */
+    for (index = 0; index < 12; index++) {
+        build_append_int_noprefix(table_data, handle->res[index], 1);
+    }
+
+    build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
+    build_append_int_noprefix(table_data, 0, 4);     /* Reserved */
+}
+
+void build_srat_generic_pci_initiator(GArray *table_data)
+{
+    GSList *gi_list, *list = acpi_generic_initiator_get_list();
+    for (gi_list = list; gi_list; gi_list = gi_list->next) {
+        AcpiGenericInitiator *gi = gi_list->data;
+        Object *o;
+        uint16List *l;
+
+        o = object_resolve_path_type(gi->device, TYPE_VFIO_PCI, NULL);
+        if (!o) {
+            continue;
+        }
+
+        for (l = gi->nodelist; l; l = l->next) {
+            PCIDeviceHandle dev_handle = {0};
+            PCIDevice *pci_dev = PCI_DEVICE(o);
+            dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                                       pci_dev->devfn);
+            build_srat_generic_pci_initiator_affinity(table_data,
+                                                      l->value, &dev_handle);
+        }
+    }
+    g_slist_free(list);
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 6b674231c2..bd53788cef 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -58,6 +58,7 @@
 #include "migration/vmstate.h"
 #include "hw/acpi/ghes.h"
 #include "hw/acpi/viot.h"
+#include "hw/acpi/acpi-generic-initiator.h"
 
 #define ARM_SPI_BASE 32
 
@@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         }
     }
 
+    build_srat_generic_pci_initiator(table_data);
+
     if (ms->nvdimms_state->is_enabled) {
         nvdimm_build_srat(table_data);
     }
diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
index bb127b2541..545f46ade5 100644
--- a/include/hw/acpi/acpi-generic-initiator.h
+++ b/include/hw/acpi/acpi-generic-initiator.h
@@ -26,4 +26,25 @@ typedef struct AcpiGenericInitiatorClass {
         ObjectClass parent_class;
 } AcpiGenericInitiatorClass;
 
+/*
+ * ACPI 6.5: Table 5-68 Flags - Generic Initiator
+ */
+typedef enum {
+    GEN_AFFINITY_NOFLAGS = 0,
+    GEN_AFFINITY_ENABLED = (1 << 0),
+    GEN_AFFINITY_ARCH_TRANS = (1 << 1),
+} GenericAffinityFlags;
+
+/*
+ * ACPI 6.5: Table 5-66 Device Handle - PCI
+ * Device Handle definition
+ */
+typedef struct PCIDeviceHandle {
+    uint16_t segment;
+    uint16_t bdf;
+    uint8_t res[12];
+} PCIDeviceHandle;
+
+void build_srat_generic_pci_initiator(GArray *table_data);
+
 #endif
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2023-11-07 19:00 ` [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
@ 2023-11-07 21:33   ` Alex Williamson
  2023-11-07 22:20   ` Michael S. Tsirkin
  2023-11-07 22:25   ` Michael S. Tsirkin
  2 siblings, 0 replies; 9+ messages in thread
From: Alex Williamson @ 2023-11-07 21:33 UTC (permalink / raw)
  To: ankita
  Cc: jgg, clg, shannon.zhaosl, peter.maydell, ani, berrange, eduardo,
	imammedo, mst, eblake, armbru, david, gshan, Jonathan.Cameron,
	aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

On Wed, 8 Nov 2023 00:30:39 +0530
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> ACPI spec provides a scheme to associate "Generic Initiators" [1]
> (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines GPUs) with Proximity Domains. This is
> achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
> Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
> node for each unique PXM ID encountered. Qemu currently do not implement
> these structures while building SRAT.
> 
> Add GI structures while building VM ACPI SRAT. The association between
> devices and nodes are stored using acpi-generic-initiator object. Lookup
> presence of all such objects and use them to build these structures.
> 
> The structure needs a PCI device handle [2] that consists of the device BDF.
> The vfio-pci device corresponding to the acpi-generic-initiator object is
> located to determine the BDF.
> 
> [1] ACPI Spec 6.5, Section 5.2.16.6
> [2] ACPI Spec 6.5, Table 5.66
> 
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
>  hw/acpi/acpi-generic-initiator.c         | 79 ++++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c                 |  3 +
>  include/hw/acpi/acpi-generic-initiator.h | 21 +++++++
>  3 files changed, 103 insertions(+)
> 
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> index 0699c878e2..6d0a8fd818 100644
> --- a/hw/acpi/acpi-generic-initiator.c
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -78,3 +78,82 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_NODELIST_PROP,
>                                    NULL, acpi_generic_initiator_set_nodelist);
>  }
> +
> +static int acpi_generic_initiator_list(Object *obj, void *opaque)
> +{
> +    GSList **list = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
> +        *list = g_slist_append(*list, ACPI_GENERIC_INITIATOR(obj));
> +    }
> +
> +    object_child_foreach(obj, acpi_generic_initiator_list, opaque);
> +    return 0;
> +}
> +
> +/*
> + * Identify Generic Initiator objects and link them into the list which is
> + * returned to the caller.
> + *
> + * Note: it is the caller's responsibility to free the list to avoid
> + * memory leak.
> + */
> +static GSList *acpi_generic_initiator_get_list(void)
> +{
> +    GSList *list = NULL;
> +
> +    object_child_foreach(object_get_root(), acpi_generic_initiator_list, &list);
> +    return list;
> +}
> +
> +/*
> + * ACPI spec, Revision 6.5
> + * 5.2.16.6 Generic Initiator Affinity Structure
> + */
> +static
> +void build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
> +                                               PCIDeviceHandle *handle)
> +{
> +    uint8_t index;
> +
> +    build_append_int_noprefix(table_data, 5, 1);     /* Type */
> +    build_append_int_noprefix(table_data, 32, 1);    /* Length */
> +    build_append_int_noprefix(table_data, 0, 1);     /* Reserved */
> +    build_append_int_noprefix(table_data, 1, 1);     /* Device Handle Type */

/* Device Handle Type: PCI */

> +    build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
> +    build_append_int_noprefix(table_data, handle->segment, 2);
> +    build_append_int_noprefix(table_data, handle->bdf, 2);
> +
> +    /* Reserved */
> +    for (index = 0; index < 12; index++) {
> +        build_append_int_noprefix(table_data, handle->res[index], 1);
> +    }
> +
> +    build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
> +    build_append_int_noprefix(table_data, 0, 4);     /* Reserved */
> +}
> +
> +void build_srat_generic_pci_initiator(GArray *table_data)
> +{
> +    GSList *gi_list, *list = acpi_generic_initiator_get_list();
> +    for (gi_list = list; gi_list; gi_list = gi_list->next) {
> +        AcpiGenericInitiator *gi = gi_list->data;
> +        Object *o;
> +        uint16List *l;
> +
> +        o = object_resolve_path_type(gi->device, TYPE_VFIO_PCI, NULL);

As per previous comments, this should not be tied to vfio.  This should
be able to describe an association between any PCI device and various
proximity domains, even those beyond this current use case.

It also looks like this support just silently fails if the device
string isn't the right type or isn't found.  That's not good.  Should
the previous patch validate the device where the Error return is more
readily available rather than only doing a strdup there?  Maybe then we
should store the object there rather than a char buffer.

Don't we also still need to enforce that the device is not hotpluggable
since we're tying it to this fixed ACPI object?  That was implicit when
previously testing for the non-hotpluggable vfio-pci device type, but
should rely on something like device_get_hotpluggable() now.

Also the ACPI Generic Initiator supports either a PCI or ACPI device
handle, where we're only adding PCI support here.  What do we want ACPI
device support to look like?  Is it sufficient that device= only
accepts a PCI device now and fails on anything else and would later be
updated to accept an ACPI device or should the object have different
entry points, ex. pci_dev = vs acpi_dev= where it might later be
introspected whether ACPI device support exists?

> +        if (!o) {
> +            continue;
> +        }
> +
> +        for (l = gi->nodelist; l; l = l->next) {
> +            PCIDeviceHandle dev_handle = {0};
> +            PCIDevice *pci_dev = PCI_DEVICE(o);

I'd explicitly set the segment to zero just to make it more apparent
that it would need to be addressed when QEMU adds multi-segment
support.  Thanks,

Alex

> +            dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> +                                                       pci_dev->devfn);
> +            build_srat_generic_pci_initiator_affinity(table_data,
> +                                                      l->value, &dev_handle);
> +        }
> +    }
> +    g_slist_free(list);
> +}
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 6b674231c2..bd53788cef 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -58,6 +58,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/acpi/ghes.h"
>  #include "hw/acpi/viot.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
>  
>  #define ARM_SPI_BASE 32
>  
> @@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          }
>      }
>  
> +    build_srat_generic_pci_initiator(table_data);
> +
>      if (ms->nvdimms_state->is_enabled) {
>          nvdimm_build_srat(table_data);
>      }
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> index bb127b2541..545f46ade5 100644
> --- a/include/hw/acpi/acpi-generic-initiator.h
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -26,4 +26,25 @@ typedef struct AcpiGenericInitiatorClass {
>          ObjectClass parent_class;
>  } AcpiGenericInitiatorClass;
>  
> +/*
> + * ACPI 6.5: Table 5-68 Flags - Generic Initiator
> + */
> +typedef enum {
> +    GEN_AFFINITY_NOFLAGS = 0,
> +    GEN_AFFINITY_ENABLED = (1 << 0),
> +    GEN_AFFINITY_ARCH_TRANS = (1 << 1),
> +} GenericAffinityFlags;
> +
> +/*
> + * ACPI 6.5: Table 5-66 Device Handle - PCI
> + * Device Handle definition
> + */
> +typedef struct PCIDeviceHandle {
> +    uint16_t segment;
> +    uint16_t bdf;
> +    uint8_t res[12];
> +} PCIDeviceHandle;
> +
> +void build_srat_generic_pci_initiator(GArray *table_data);
> +
>  #endif



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2023-11-07 19:00 ` [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
  2023-11-07 21:33   ` Alex Williamson
@ 2023-11-07 22:20   ` Michael S. Tsirkin
  2023-11-07 22:25   ` Michael S. Tsirkin
  2 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2023-11-07 22:20 UTC (permalink / raw)
  To: ankita
  Cc: jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell, ani,
	berrange, eduardo, imammedo, eblake, armbru, david, gshan,
	Jonathan.Cameron, aniketa, cjia, kwankhede, targupta, vsethi,
	acurrid, dnigam, udhoke, qemu-arm, qemu-devel

On Wed, Nov 08, 2023 at 12:30:39AM +0530, ankita@nvidia.com wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
> 
> ACPI spec provides a scheme to associate "Generic Initiators" [1]
> (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines GPUs) with Proximity Domains. This is
> achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
> Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
> node for each unique PXM ID encountered. Qemu currently do not implement
> these structures while building SRAT.
> 
> Add GI structures while building VM ACPI SRAT. The association between
> devices and nodes are stored using acpi-generic-initiator object. Lookup
> presence of all such objects and use them to build these structures.
> 
> The structure needs a PCI device handle [2] that consists of the device BDF.
> The vfio-pci device corresponding to the acpi-generic-initiator object is
> located to determine the BDF.
> 
> [1] ACPI Spec 6.5, Section 5.2.16.6
> [2] ACPI Spec 6.5, Table 5.66
> 
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
>  hw/acpi/acpi-generic-initiator.c         | 79 ++++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c                 |  3 +
>  include/hw/acpi/acpi-generic-initiator.h | 21 +++++++
>  3 files changed, 103 insertions(+)
> 
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> index 0699c878e2..6d0a8fd818 100644
> --- a/hw/acpi/acpi-generic-initiator.c
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -78,3 +78,82 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
>      object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_NODELIST_PROP,
>                                    NULL, acpi_generic_initiator_set_nodelist);
>  }
> +
> +static int acpi_generic_initiator_list(Object *obj, void *opaque)
> +{
> +    GSList **list = opaque;
> +
> +    if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
> +        *list = g_slist_append(*list, ACPI_GENERIC_INITIATOR(obj));
> +    }
> +
> +    object_child_foreach(obj, acpi_generic_initiator_list, opaque);
> +    return 0;
> +}
> +
> +/*
> + * Identify Generic Initiator objects and link them into the list which is
> + * returned to the caller.
> + *
> + * Note: it is the caller's responsibility to free the list to avoid
> + * memory leak.
> + */
> +static GSList *acpi_generic_initiator_get_list(void)
> +{
> +    GSList *list = NULL;
> +
> +    object_child_foreach(object_get_root(), acpi_generic_initiator_list, &list);
> +    return list;
> +}
> +
> +/*
> + * ACPI spec, Revision 6.5

we normally just say ACPI 6.5 even though a couple of places are more
verbose.

> + * 5.2.16.6 Generic Initiator Affinity Structure
> + */
> +static
> +void build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
> +                                               PCIDeviceHandle *handle)
> +{
> +    uint8_t index;
> +
> +    build_append_int_noprefix(table_data, 5, 1);     /* Type */
> +    build_append_int_noprefix(table_data, 32, 1);    /* Length */
> +    build_append_int_noprefix(table_data, 0, 1);     /* Reserved */
> +    build_append_int_noprefix(table_data, 1, 1);     /* Device Handle Type */
> +    build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
> +    build_append_int_noprefix(table_data, handle->segment, 2);
> +    build_append_int_noprefix(table_data, handle->bdf, 2);
> +
> +    /* Reserved */
> +    for (index = 0; index < 12; index++) {
> +        build_append_int_noprefix(table_data, handle->res[index], 1);
> +    }
> +
> +    build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
> +    build_append_int_noprefix(table_data, 0, 4);     /* Reserved */
> +}
> +
> +void build_srat_generic_pci_initiator(GArray *table_data)
> +{
> +    GSList *gi_list, *list = acpi_generic_initiator_get_list();
> +    for (gi_list = list; gi_list; gi_list = gi_list->next) {
> +        AcpiGenericInitiator *gi = gi_list->data;
> +        Object *o;
> +        uint16List *l;
> +
> +        o = object_resolve_path_type(gi->device, TYPE_VFIO_PCI, NULL);
> +        if (!o) {
> +            continue;
> +        }
> +
> +        for (l = gi->nodelist; l; l = l->next) {
> +            PCIDeviceHandle dev_handle = {0};
> +            PCIDevice *pci_dev = PCI_DEVICE(o);
> +            dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> +                                                       pci_dev->devfn);
> +            build_srat_generic_pci_initiator_affinity(table_data,
> +                                                      l->value, &dev_handle);
> +        }
> +    }
> +    g_slist_free(list);
> +}
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 6b674231c2..bd53788cef 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -58,6 +58,7 @@
>  #include "migration/vmstate.h"
>  #include "hw/acpi/ghes.h"
>  #include "hw/acpi/viot.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
>  
>  #define ARM_SPI_BASE 32
>  
> @@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          }
>      }
>  
> +    build_srat_generic_pci_initiator(table_data);
> +
>      if (ms->nvdimms_state->is_enabled) {
>          nvdimm_build_srat(table_data);
>      }
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> index bb127b2541..545f46ade5 100644
> --- a/include/hw/acpi/acpi-generic-initiator.h
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -26,4 +26,25 @@ typedef struct AcpiGenericInitiatorClass {
>          ObjectClass parent_class;
>  } AcpiGenericInitiatorClass;
>  
> +/*
> + * ACPI 6.5: Table 5-68 Flags - Generic Initiator
> + */
> +typedef enum {
> +    GEN_AFFINITY_NOFLAGS = 0,
> +    GEN_AFFINITY_ENABLED = (1 << 0),
> +    GEN_AFFINITY_ARCH_TRANS = (1 << 1),
> +} GenericAffinityFlags;

Don't add these one-time use flags. They are impossible to match to
spec without reading and memorizing all of it. The way we do it in ACPI
code is this:

(1 << 0) /* [text matching ACPI spec verbatim ] */

this also means you will not add a ton of dead code just because it is
in the spec.

> +
> +/*
> + * ACPI 6.5: Table 5-66 Device Handle - PCI

In ACPI we document *earliest* spec version that includes this, not just
a random one you looked at. I checked 6.3 and it's there.
Pls find earliest one.

Same applies everywhere


> + * Device Handle definition

Again match spec text exactly. one line, and "definition" is not there.

> + */
> +typedef struct PCIDeviceHandle {
> +    uint16_t segment;
> +    uint16_t bdf;
> +    uint8_t res[12];

what is this "res" and why do you need to pass it? It's always 0 isn't
it?

> +} PCIDeviceHandle;
> +
> +void build_srat_generic_pci_initiator(GArray *table_data);
> +
>  #endif
> -- 
> 2.17.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2023-11-07 19:00 ` [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
  2023-11-07 21:33   ` Alex Williamson
  2023-11-07 22:20   ` Michael S. Tsirkin
@ 2023-11-07 22:25   ` Michael S. Tsirkin
  2023-11-13 11:14     ` Ankit Agrawal
  2 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2023-11-07 22:25 UTC (permalink / raw)
  To: ankita
  Cc: jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell, ani,
	berrange, eduardo, imammedo, eblake, armbru, david, gshan,
	Jonathan.Cameron, aniketa, cjia, kwankhede, targupta, vsethi,
	acurrid, dnigam, udhoke, qemu-arm, qemu-devel

On Wed, Nov 08, 2023 at 12:30:39AM +0530, ankita@nvidia.com wrote:
> +        for (l = gi->nodelist; l; l = l->next) {
> +            PCIDeviceHandle dev_handle = {0};
> +            PCIDevice *pci_dev = PCI_DEVICE(o);
> +            dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> +                                                       pci_dev->devfn);
> +            build_srat_generic_pci_initiator_affinity(table_data,
> +                                                      l->value, &dev_handle);
> +        }
> +    }

if you never initialize segment then I don't see why have it.
It's just the bdf, just pass that as parameter no need for a struct.

-- 
MST



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2023-11-07 22:25   ` Michael S. Tsirkin
@ 2023-11-13 11:14     ` Ankit Agrawal
  2023-11-13 14:18       ` Michael S. Tsirkin
  0 siblings, 1 reply; 9+ messages in thread
From: Ankit Agrawal @ 2023-11-13 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, alex.williamson@redhat.com, clg@redhat.com,
	shannon.zhaosl@gmail.com, peter.maydell@linaro.org,
	ani@anisinha.ca, berrange@redhat.com, eduardo@habkost.net,
	imammedo@redhat.com, eblake@redhat.com, armbru@redhat.com,
	david@redhat.com, gshan@redhat.com, Jonathan.Cameron@huawei.com,
	Aniket Agashe, Neo Jia, Kirti Wankhede, Tarun Gupta (SW-GPU),
	Vikram Sethi, Andy Currid, Dheeraj Nigam, Uday Dhoke,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org

>> +        for (l = gi->nodelist; l; l = l->next) {
>> +            PCIDeviceHandle dev_handle = {0};
>> +            PCIDevice *pci_dev = PCI_DEVICE(o);
>> +            dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
>> +                                                       pci_dev->devfn);
>> +            build_srat_generic_pci_initiator_affinity(table_data,
>> +                                                      l->value, &dev_handle);
>> +        }
>> +    }
>
> if you never initialize segment then I don't see why have it.
> It's just the bdf, just pass that as parameter no need for a struct.
>
> I'd explicitly set the segment to zero just to make it more apparent
> that it would need to be addressed when QEMU adds multi-segment
> support.

Okay, so I'll keep the segment id, but set it to 0 explicitly.

>> + * ACPI spec, Revision 6.5
>
> we normally just say ACPI 6.5 even though a couple of places are more
> verbose.
>
> In ACPI we document *earliest* spec version that includes this, not just
> a random one you looked at. I checked 6.3 and it's there.
> Pls find earliest one.

Will make the change.

>> +typedef enum {
>> +    GEN_AFFINITY_NOFLAGS = 0,
>> +    GEN_AFFINITY_ENABLED = (1 << 0),
>> +    GEN_AFFINITY_ARCH_TRANS = (1 << 1),
>> +} GenericAffinityFlags;
>
> Don't add these one-time use flags. They are impossible to match to
> spec without reading and memorizing all of it. The way we do it in ACPI
> code is this:
>
> (1 << 0) /* [text matching ACPI spec verbatim ] */
>
> this also means you will not add a ton of dead code just because it is
> in the spec.

Ack.

>> +typedef struct PCIDeviceHandle {
>> +    uint16_t segment;
>> +    uint16_t bdf;
>> +    uint8_t res[12];
>
> what is this "res" and why do you need to pass it? It's always 0 isn't
> it?

It is 12 bytes reserved field in the "Device Handle - PCI" described in 
ACPI 6.5, Table 5.66. I'll remove it.

>> +
>> +        o = object_resolve_path_type(gi->device, TYPE_VFIO_PCI, NULL);
>
> As per previous comments, this should not be tied to vfio.  This should
> be able to describe an association between any PCI device and various
> proximity domains, even those beyond this current use case.

Sure, will change it to use TYPE_PCI_DEVICE.

> It also looks like this support just silently fails if the device
> string isn't the right type or isn't found.  That's not good.  Should
> the previous patch validate the device where the Error return is more
> readily available rather than only doing a strdup there?  Maybe then we
> should store the object there rather than a char buffer.

AFAIU in a normal flow currently, a qemu -object is (parsed and) created much
earlier that a -device. This complicates the situation as when the
acpi-generic-initiator object is being created, the device is not available for
error check. Maybe I should treat this object specially to create much later?

> Don't we also still need to enforce that the device is not hotpluggable
> since we're tying it to this fixed ACPI object?  That was implicit when
> previously testing for the non-hotpluggable vfio-pci device type, but
> should rely on something like device_get_hotpluggable() now.

I think this will be similarly problematic as above due to the sequence of
object creation.

> Also the ACPI Generic Initiator supports either a PCI or ACPI device
> handle, where we're only adding PCI support here.  What do we want ACPI
> device support to look like?  Is it sufficient that device= only
> accepts a PCI device now and fails on anything else and would later be
> updated to accept an ACPI device or should the object have different
> entry points, ex. pci_dev = vs acpi_dev= where it might later be
> introspected whether ACPI device support exists?

I am fine with either way. If we prefer different entry points, I can make the
change.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2023-11-13 11:14     ` Ankit Agrawal
@ 2023-11-13 14:18       ` Michael S. Tsirkin
  0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2023-11-13 14:18 UTC (permalink / raw)
  To: Ankit Agrawal
  Cc: Jason Gunthorpe, alex.williamson@redhat.com, clg@redhat.com,
	shannon.zhaosl@gmail.com, peter.maydell@linaro.org,
	ani@anisinha.ca, berrange@redhat.com, eduardo@habkost.net,
	imammedo@redhat.com, eblake@redhat.com, armbru@redhat.com,
	david@redhat.com, gshan@redhat.com, Jonathan.Cameron@huawei.com,
	Aniket Agashe, Neo Jia, Kirti Wankhede, Tarun Gupta (SW-GPU),
	Vikram Sethi, Andy Currid, Dheeraj Nigam, Uday Dhoke,
	qemu-arm@nongnu.org, qemu-devel@nongnu.org

On Mon, Nov 13, 2023 at 11:14:00AM +0000, Ankit Agrawal wrote:
> > It also looks like this support just silently fails if the device
> > string isn't the right type or isn't found.  That's not good.  Should
> > the previous patch validate the device where the Error return is more
> > readily available rather than only doing a strdup there?  Maybe then we
> > should store the object there rather than a char buffer.
> 
> AFAIU in a normal flow currently, a qemu -object is (parsed and) created much
> earlier that a -device. This complicates the situation as when the
> acpi-generic-initiator object is being created, the device is not available for
> error check. Maybe I should treat this object specially to create much later?
> 
> > Don't we also still need to enforce that the device is not hotpluggable
> > since we're tying it to this fixed ACPI object?  That was implicit when
> > previously testing for the non-hotpluggable vfio-pci device type, but
> > should rely on something like device_get_hotpluggable() now.
> 
> I think this will be similarly problematic as above due to the sequence of
> object creation.
> 
> > Also the ACPI Generic Initiator supports either a PCI or ACPI device
> > handle, where we're only adding PCI support here.  What do we want ACPI
> > device support to look like?  Is it sufficient that device= only
> > accepts a PCI device now and fails on anything else and would later be
> > updated to accept an ACPI device or should the object have different
> > entry points, ex. pci_dev = vs acpi_dev= where it might later be
> > introspected whether ACPI device support exists?
> 
> I am fine with either way. If we prefer different entry points, I can make the
> change.


Not the expert on QOM. Hope one of QOM maintainers can answer.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] qom: new object to associate device to numa node
  2023-11-07 19:00 ` [PATCH v3 1/2] qom: new object to associate device to numa node ankita
@ 2023-11-15 13:59   ` Markus Armbruster
  0 siblings, 0 replies; 9+ messages in thread
From: Markus Armbruster @ 2023-11-15 13:59 UTC (permalink / raw)
  To: ankita
  Cc: jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell, ani,
	berrange, eduardo, imammedo, mst, eblake, david, gshan,
	Jonathan.Cameron, aniketa, cjia, kwankhede, targupta, vsethi,
	acurrid, dnigam, udhoke, qemu-arm, qemu-devel

<ankita@nvidia.com> writes:

> From: Ankit Agrawal <ankita@nvidia.com>
>
> NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
> partitioning of the GPU device resources (including device memory) into
> several (upto 8) isolated instances. Each of the partitioned memory needs
> a dedicated NUMA node to operate. The partitions are not fixed and they
> can be created/deleted at runtime.
>
> Unfortunately Linux OS does not provide a means to dynamically create/destroy
> NUMA nodes and such feature implementation is not expected to be trivial. The
> nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
> we utilize the Generic Initiator Affinity structures that allows association
> between nodes and devices. Multiple GI structures per BDF is possible,
> allowing creation of multiple nodes by exposing unique PXM in each of these
> structures.
>
> Introduce a new acpi-generic-initiator object to allow host admin provide the
> device and the corresponding NUMA nodes. Qemu maintain this association and
> use this object to build the requisite GI Affinity Structure.
>
> An admin can provide the range of nodes using a ':' delimited numalist and

Please don't create special-purpose syntax, use existing general-purpose
syntax.  See also review of qom.json below.

> link it to a device by providing its id. The node ids are extracted from
> numalist and stores as a uint16List. The following sample creates 8 nodes
> and link them to the device dev0:
>
> -numa node,nodeid=2 \
> -numa node,nodeid=3 \
> -numa node,nodeid=4 \
> -numa node,nodeid=5 \
> -numa node,nodeid=6 \
> -numa node,nodeid=7 \
> -numa node,nodeid=8 \
> -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,device=dev0,numalist=2:3:4:5:6:7:8:9 \
>
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
>  hw/acpi/acpi-generic-initiator.c         | 80 ++++++++++++++++++++++++
>  hw/acpi/meson.build                      |  1 +
>  include/hw/acpi/acpi-generic-initiator.h | 29 +++++++++
>  qapi/qom.json                            | 16 +++++
>  4 files changed, 126 insertions(+)
>  create mode 100644 hw/acpi/acpi-generic-initiator.c
>  create mode 100644 include/hw/acpi/acpi-generic-initiator.h
>
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> new file mode 100644
> index 0000000000..0699c878e2
> --- /dev/null
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qapi/visitor.h"
> +#include "qom/object_interfaces.h"
> +#include "qom/object.h"
> +#include "hw/qdev-core.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/pci.h"
> +#include "hw/pci/pci_device.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
> +                   ACPI_GENERIC_INITIATOR, OBJECT,
> +                   { TYPE_USER_CREATABLE },
> +                   { NULL })
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
> +
> +static void acpi_generic_initiator_init(Object *obj)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +    gi->device = NULL;
> +    gi->nodelist = NULL;
> +}
> +
> +static void acpi_generic_initiator_finalize(Object *obj)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> +    g_free(gi->device);
> +    qapi_free_uint16List(gi->nodelist);
> +}
> +
> +static void acpi_generic_initiator_set_device(Object *obj, const char *val,
> +                                              Error **errp)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> +    gi->device = g_strdup(val);
> +}
> +
> +static void acpi_generic_initiator_set_nodelist(Object *obj, const char *val,
> +                                            Error **errp)
> +{
> +    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +    char *value = g_strdup(val);
> +    uint16_t node;
> +    uint16List **tail = &(gi->nodelist);
> +    char *nodestr = value ? strtok(value, ":") : NULL;
> +
> +    while (nodestr) {
> +        if (sscanf(nodestr, "%hu", &node) != 1) {
> +            error_setg(errp, "failed to read node-id");
> +            return;
> +        }
> +
> +        if (node >= MAX_NODES) {
> +            error_setg(errp, "invalid node-id");
> +            return;
> +        }
> +
> +        QAPI_LIST_APPEND(tail, node);
> +        nodestr = strtok(NULL, ":");
> +    }
> +}
> +
> +static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> +{
> +    object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_DEVICE_PROP, NULL,
> +                                  acpi_generic_initiator_set_device);
> +    object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_NODELIST_PROP,
> +                                  NULL, acpi_generic_initiator_set_nodelist);
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index fc1b952379..2268589519 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -1,5 +1,6 @@
>  acpi_ss = ss.source_set()
>  acpi_ss.add(files(
> +  'acpi-generic-initiator.c',
>    'acpi_interface.c',
>    'aml-build.c',
>    'bios-linker-loader.c',
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> new file mode 100644
> index 0000000000..bb127b2541
> --- /dev/null
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -0,0 +1,29 @@
> +#ifndef ACPI_GENERIC_INITIATOR_H
> +#define ACPI_GENERIC_INITIATOR_H
> +
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "qemu/uuid.h"
> +#include "hw/acpi/aml-build.h"
> +#include "qom/object.h"
> +#include "qom/object_interfaces.h"
> +
> +#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
> +
> +#define ACPI_GENERIC_INITIATOR_DEVICE_PROP "device"
> +#define ACPI_GENERIC_INITIATOR_NODELIST_PROP "nodelist"
> +
> +typedef struct AcpiGenericInitiator {
> +    /* private */
> +    Object parent;
> +
> +    /* public */
> +    char *device;
> +    uint16List *nodelist;
> +} AcpiGenericInitiator;
> +
> +typedef struct AcpiGenericInitiatorClass {
> +        ObjectClass parent_class;
> +} AcpiGenericInitiatorClass;
> +
> +#endif
> diff --git a/qapi/qom.json b/qapi/qom.json
> index fa3e88c8e6..66d2bffdcc 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -779,6 +779,20 @@
>  { 'struct': 'VfioUserServerProperties',
>    'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>  
> +##
> +# @AcpiGenericInitiatorProperties:
> +#
> +# Properties for acpi-generic-initiator objects.
> +#
> +# @device: the ID of the device to be associated with the node
> +#
> +# @nodelist: delimited numa node list
> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'AcpiGenericInitiatorProperties',
> +  'data': { 'device': 'str', 'nodelist': 'str' } }

Do not encode structured data in strings.  Instead:

    'nodes': ['uint16']

This matches MemoryBackendProperties member @host-nodes.

Check out host_memory_backend_get_host_nodes() and
host_memory_backend_set_host_nodes() to see how to work with such a
member.

> +
>  ##
>  # @RngProperties:
>  #
> @@ -896,6 +910,7 @@
>  ##
>  { 'enum': 'ObjectType',
>    'data': [
> +    'acpi-generic-initiator',
>      'authz-list',
>      'authz-listfile',
>      'authz-pam',
> @@ -966,6 +981,7 @@
>              'id': 'str' },
>    'discriminator': 'qom-type',
>    'data': {
> +      'acpi-generic-initiator':     'AcpiGenericInitiatorProperties',
>        'authz-list':                 'AuthZListProperties',
>        'authz-listfile':             'AuthZListFileProperties',
>        'authz-pam':                  'AuthZPAMProperties',



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-15 14:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-07 19:00 [PATCH v3 0/2] vfio/nvgpu: Add vfio pci variant module for grace hopper ankita
2023-11-07 19:00 ` [PATCH v3 1/2] qom: new object to associate device to numa node ankita
2023-11-15 13:59   ` Markus Armbruster
2023-11-07 19:00 ` [PATCH v3 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2023-11-07 21:33   ` Alex Williamson
2023-11-07 22:20   ` Michael S. Tsirkin
2023-11-07 22:25   ` Michael S. Tsirkin
2023-11-13 11:14     ` Ankit Agrawal
2023-11-13 14:18       ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).