* [PATCH v4 0/2] acpi: report numa nodes for device memory using GI
@ 2023-11-19 13:01 ankita
2023-11-19 13:01 ` [PATCH v4 1/2] qom: new object to associate device to numa node ankita
2023-11-19 13:01 ` [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
0 siblings, 2 replies; 7+ messages in thread
From: ankita @ 2023-11-19 13:01 UTC (permalink / raw)
To: ankita, jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell,
ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
gshan, Jonathan.Cameron
Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
udhoke, qemu-arm, qemu-devel
From: Ankit Agrawal <ankita@nvidia.com>
There are upcoming devices which allow CPU to cache coherently access
their memory. It is sensible to expose such memory as NUMA nodes separate
from the sysmem node to the OS. The ACPI spec provides a scheme in SRAT
called Generic Initiator Affinity Structure [1] to allow an association
between a Proximity Domain (PXM) and a Generic Initiator (GI) (e.g.
heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines).
While a single node per device may cover several use cases, it is however
insufficient for a full utilization of the NVIDIA GPUs MIG
(Mult-Instance GPUs) [2] feature. The feature allows partitioning of the
GPU device resources (including device memory) into several (upto 8)
isolated instances. Each of the partitioned memory requires a dedicated NUMA
node to operate. The partitions are not fixed and they can be created/deleted
at runtime.
Linux OS does not provide a means to dynamically create/destroy NUMA nodes
and such feature implementation is expected to be non-trivial. The nodes
that OS discovers at the boot time while parsing SRAT remains fixed. So we
utilize the GI Affinity structures that allows association between nodes
and devices. Multiple GI structures per device/BDF is possible, allowing
creation of multiple nodes in the VM by exposing unique PXM in each of these
structures.
Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object that allows an
association of a set of nodes with a device. During SRAT creation, all such
objected are identified and used to add the GI Affinity Structures. Currently,
only PCI device is supported and an error is returned for acpi device.
The admin will create a range of 8 nodes and associate that with the device
using the acpi-generic-initiator object. While a configuration of less than
8 nodes per device is allowed, such configuration will prevent utilization of
the feature to the fullest. This setting is applicable to all the Grace+Hopper
systems. The following is an example of the Qemu command line arguments to
create 8 nodes and link them to the device 'dev0':
-numa node,nodeid=2 \
-numa node,nodeid=3 \
-numa node,nodeid=4 \
-numa node,nodeid=5 \
-numa node,nodeid=6 \
-numa node,nodeid=7 \
-numa node,nodeid=8 \
-numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
The performance benefits can be realized by providing the NUMA node distances
appropriately (through libvirt tags or Qemu params). The admin can get the
distance among nodes in hardware using `numactl -H`.
This series goes along with the vfio-pci variant driver [3] under review.
Applied over v8.2.0.
[1] ACPI Spec 6.3, Section 5.2.16.6
[2] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
[3] https://lore.kernel.org/all/20231114081611.30550-1-ankita@nvidia.com/
Link for v3:
https://lore.kernel.org/all/20231107190039.19434-1-ankita@nvidia.com/
v3 -> v4
- changed the ':' delimited way to a uint16 array to communicate the
nodes associated with the device.
- added asserts to handle invalid inputs.
- addressed other miscellaneous v3 comments.
v2 -> v3
- changed param to accept a ':' delimited list of numa nodes, instead
of a range.
- Removed nvidia-acpi-generic-initiator object.
- Addressed miscellaneous comments in v2.
v1 -> v2
- Removed dependency on sysfs to communicate the feature with variant module.
- Use GI Affinity SRAT structure instead of Memory Affinity.
- No DSDT entries needed to communicate the PXM for the device. SRAT GI
structure is used instead.
- New objects introduced to establish link between device and nodes.
Ankit Agrawal (2):
qom: new object to associate device to numa node
hw/acpi: Implement the SRAT GI affinity structure
hw/acpi/acpi-generic-initiator.c | 185 +++++++++++++++++++++++
hw/acpi/meson.build | 1 +
hw/arm/virt-acpi-build.c | 3 +
include/hw/acpi/acpi-generic-initiator.h | 56 +++++++
qapi/qom.json | 18 +++
5 files changed, 263 insertions(+)
create mode 100644 hw/acpi/acpi-generic-initiator.c
create mode 100644 include/hw/acpi/acpi-generic-initiator.h
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 1/2] qom: new object to associate device to numa node
2023-11-19 13:01 [PATCH v4 0/2] acpi: report numa nodes for device memory using GI ankita
@ 2023-11-19 13:01 ` ankita
2023-11-27 22:57 ` Alex Williamson
2023-11-30 14:12 ` Markus Armbruster
2023-11-19 13:01 ` [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
1 sibling, 2 replies; 7+ messages in thread
From: ankita @ 2023-11-19 13:01 UTC (permalink / raw)
To: ankita, jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell,
ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
gshan, Jonathan.Cameron
Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
udhoke, qemu-arm, qemu-devel
From: Ankit Agrawal <ankita@nvidia.com>
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.
Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator Affinity structures that allows association
between nodes and devices. Multiple GI structures per BDF is possible,
allowing creation of multiple nodes by exposing unique PXM in each of these
structures.
Introduce a new acpi-generic-initiator object to allow host admin provide the
device and the corresponding NUMA nodes. Qemu maintain this association and
use this object to build the requisite GI Affinity Structure.
An admin can provide the range of nodes through a uint16 array host-nodes
and link it to a device by providing its id. Currently, only PCI device is
supported and an error is returned for acpi device. The following sample
creates 8 nodes and link them to the PCI device dev0:
-numa node,nodeid=2 \
-numa node,nodeid=3 \
-numa node,nodeid=4 \
-numa node,nodeid=5 \
-numa node,nodeid=6 \
-numa node,nodeid=7 \
-numa node,nodeid=8 \
-numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
[1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
hw/acpi/acpi-generic-initiator.c | 84 ++++++++++++++++++++++++
hw/acpi/meson.build | 1 +
include/hw/acpi/acpi-generic-initiator.h | 30 +++++++++
qapi/qom.json | 18 +++++
4 files changed, 133 insertions(+)
create mode 100644 hw/acpi/acpi-generic-initiator.c
create mode 100644 include/hw/acpi/acpi-generic-initiator.h
diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
new file mode 100644
index 0000000000..5ea51cb81e
--- /dev/null
+++ b/hw/acpi/acpi-generic-initiator.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qapi/visitor.h"
+#include "qom/object_interfaces.h"
+#include "qom/object.h"
+#include "hw/qdev-core.h"
+#include "hw/vfio/vfio-common.h"
+#include "hw/vfio/pci.h"
+#include "hw/pci/pci_device.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/acpi-generic-initiator.h"
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
+ ACPI_GENERIC_INITIATOR, OBJECT,
+ { TYPE_USER_CREATABLE },
+ { NULL })
+
+OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
+
+static void acpi_generic_initiator_init(Object *obj)
+{
+ AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+ gi->device = NULL;
+ gi->nodelist = NULL;
+}
+
+static void acpi_generic_initiator_finalize(Object *obj)
+{
+ AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+ g_free(gi->device);
+ qapi_free_uint16List(gi->nodelist);
+}
+
+static void acpi_generic_initiator_set_pci_device(Object *obj, const char *val,
+ Error **errp)
+{
+ AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+ gi->device = g_strdup(val);
+}
+
+static void acpi_generic_initiator_set_acpi_device(Object *obj, const char *val,
+ Error **errp)
+{
+ error_setg(errp, "Generic Initiator ACPI device not supported");
+}
+
+static void
+acpi_generic_initiator_set_host_nodes(Object *obj, Visitor *v, const char *name,
+ void *opaque, Error **errp)
+{
+ AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+ uint16List *l;
+
+ visit_type_uint16List(v, name, &(gi->nodelist), errp);
+
+ for (l = gi->nodelist; l; l = l->next) {
+ if (l->value >= MAX_NODES) {
+ error_setg(errp, "Invalid host-nodes value: %d", l->value);
+ qapi_free_uint16List(gi->nodelist);
+ return;
+ }
+ }
+}
+
+static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
+{
+ object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP,
+ NULL, acpi_generic_initiator_set_pci_device);
+ object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP,
+ NULL, acpi_generic_initiator_set_acpi_device);
+ object_class_property_add(oc, ACPI_GENERIC_INITIATOR_HOSTNODE_PROP, "int",
+ NULL,
+ acpi_generic_initiator_set_host_nodes,
+ NULL, NULL);
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fc1b952379..2268589519 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -1,5 +1,6 @@
acpi_ss = ss.source_set()
acpi_ss.add(files(
+ 'acpi-generic-initiator.c',
'acpi_interface.c',
'aml-build.c',
'bios-linker-loader.c',
diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
new file mode 100644
index 0000000000..db3ed02c80
--- /dev/null
+++ b/include/hw/acpi/acpi-generic-initiator.h
@@ -0,0 +1,30 @@
+#ifndef ACPI_GENERIC_INITIATOR_H
+#define ACPI_GENERIC_INITIATOR_H
+
+#include "hw/mem/pc-dimm.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/aml-build.h"
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
+
+#define ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP "pci-dev"
+#define ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP "acpi-dev"
+#define ACPI_GENERIC_INITIATOR_HOSTNODE_PROP "host-nodes"
+
+typedef struct AcpiGenericInitiator {
+ /* private */
+ Object parent;
+
+ /* public */
+ char *device;
+ uint16List *nodelist;
+} AcpiGenericInitiator;
+
+typedef struct AcpiGenericInitiatorClass {
+ ObjectClass parent_class;
+} AcpiGenericInitiatorClass;
+
+#endif
diff --git a/qapi/qom.json b/qapi/qom.json
index c53ef978ff..f726f5ea41 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -794,6 +794,22 @@
{ 'struct': 'VfioUserServerProperties',
'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+##
+# @AcpiGenericInitiatorProperties:
+#
+# Properties for acpi-generic-initiator objects.
+#
+# @pci-dev: PCI device ID to be associated with the node
+#
+# @acpi-dev: ACPI device ID to be associated with the node
+#
+# @host-nodes: numa node list
+#
+# Since: 8.2
+##
+{ 'struct': 'AcpiGenericInitiatorProperties',
+ 'data': { '*pci-dev': 'str', '*acpi-dev': 'str', 'host-nodes': ['uint16'] } }
+
##
# @RngProperties:
#
@@ -911,6 +927,7 @@
##
{ 'enum': 'ObjectType',
'data': [
+ 'acpi-generic-initiator',
'authz-list',
'authz-listfile',
'authz-pam',
@@ -981,6 +998,7 @@
'id': 'str' },
'discriminator': 'qom-type',
'data': {
+ 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties',
'authz-list': 'AuthZListProperties',
'authz-listfile': 'AuthZListFileProperties',
'authz-pam': 'AuthZPAMProperties',
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure
2023-11-19 13:01 [PATCH v4 0/2] acpi: report numa nodes for device memory using GI ankita
2023-11-19 13:01 ` [PATCH v4 1/2] qom: new object to associate device to numa node ankita
@ 2023-11-19 13:01 ` ankita
2023-11-27 22:57 ` Alex Williamson
2023-11-30 14:12 ` Markus Armbruster
1 sibling, 2 replies; 7+ messages in thread
From: ankita @ 2023-11-19 13:01 UTC (permalink / raw)
To: ankita, jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell,
ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
gshan, Jonathan.Cameron
Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
udhoke, qemu-arm, qemu-devel
From: Ankit Agrawal <ankita@nvidia.com>
ACPI spec provides a scheme to associate "Generic Initiators" [1]
(e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines GPUs) with Proximity Domains. This is
achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
node for each unique PXM ID encountered. Qemu currently do not implement
these structures while building SRAT.
Add GI structures while building VM ACPI SRAT. The association between
devices and nodes are stored using acpi-generic-initiator object. Lookup
presence of all such objects and use them to build these structures.
The structure needs a PCI device handle [2] that consists of the device BDF.
The vfio-pci device corresponding to the acpi-generic-initiator object is
located to determine the BDF.
[1] ACPI Spec 6.3, Section 5.2.16.6
[2] ACPI Spec 6.3, Table 5.80
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
hw/acpi/acpi-generic-initiator.c | 100 +++++++++++++++++++++++
hw/arm/virt-acpi-build.c | 3 +
include/hw/acpi/acpi-generic-initiator.h | 26 ++++++
3 files changed, 129 insertions(+)
diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
index 5ea51cb81e..a9222438ec 100644
--- a/hw/acpi/acpi-generic-initiator.c
+++ b/hw/acpi/acpi-generic-initiator.c
@@ -16,6 +16,7 @@
#include "hw/pci/pci_device.h"
#include "sysemu/numa.h"
#include "hw/acpi/acpi-generic-initiator.h"
+#include "qemu/error-report.h"
OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
ACPI_GENERIC_INITIATOR, OBJECT,
@@ -82,3 +83,102 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
acpi_generic_initiator_set_host_nodes,
NULL, NULL);
}
+
+static int acpi_generic_initiator_list(Object *obj, void *opaque)
+{
+ GSList **list = opaque;
+
+ if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
+ *list = g_slist_append(*list, ACPI_GENERIC_INITIATOR(obj));
+ }
+
+ object_child_foreach(obj, acpi_generic_initiator_list, opaque);
+ return 0;
+}
+
+/*
+ * Identify Generic Initiator objects and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+static GSList *acpi_generic_initiator_get_list(void)
+{
+ GSList *list = NULL;
+
+ object_child_foreach(object_get_root(), acpi_generic_initiator_list, &list);
+ return list;
+}
+
+/*
+ * ACPI 6.3:
+ * Table 5-78 Generic Initiator Affinity Structure
+ */
+static
+void build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
+ PCIDeviceHandle *handle)
+{
+ uint8_t index;
+
+ build_append_int_noprefix(table_data, 5, 1); /* Type */
+ build_append_int_noprefix(table_data, 32, 1); /* Length */
+ build_append_int_noprefix(table_data, 0, 1); /* Reserved */
+ build_append_int_noprefix(table_data, 1, 1); /* Device Handle Type: PCI */
+ build_append_int_noprefix(table_data, node, 4); /* Proximity Domain */
+
+ /* Device Handle - PCI */
+ build_append_int_noprefix(table_data, handle->segment, 2);
+ build_append_int_noprefix(table_data, handle->bdf, 2);
+ for (index = 0; index < 12; index++) {
+ build_append_int_noprefix(table_data, 0, 1);
+ }
+
+ build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
+ build_append_int_noprefix(table_data, 0, 4); /* Reserved */
+}
+
+void build_srat_generic_pci_initiator(GArray *table_data)
+{
+ GSList *gi_list, *list = acpi_generic_initiator_get_list();
+ AcpiGenericInitiator *gi;
+
+ for (gi_list = list; gi_list; gi_list = gi_list->next) {
+ Object *o;
+ uint16List *l;
+ PCIDevice *pci_dev;
+ bool node_specified = false;
+
+ gi = gi_list->data;
+
+ /* User fails to provide a device. */
+ g_assert(gi->device);
+
+ o = object_resolve_path_type(gi->device, TYPE_PCI_DEVICE, NULL);
+ if (!o) {
+ error_printf("Specified device must be a PCI device.\n");
+ g_assert(o);
+ }
+ pci_dev = PCI_DEVICE(o);
+
+ for (l = gi->nodelist; l; l = l->next) {
+ PCIDeviceHandle dev_handle;
+ dev_handle.segment = 0;
+ dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+ pci_dev->devfn);
+ build_srat_generic_pci_initiator_affinity(table_data,
+ l->value, &dev_handle);
+ node_specified = true;
+ }
+
+ if (!node_specified) {
+ error_report("Generic Initiator device 0:%x:%x.%x has no associated"
+ " NUMA node.", pci_bus_num(pci_get_bus(pci_dev)),
+ PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
+ error_printf("Specify NUMA node with -nodelist option.\n");
+ g_assert(node_specified);
+ }
+ }
+
+ g_slist_free(list);
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 8bc35a483c..00d77327e0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -58,6 +58,7 @@
#include "migration/vmstate.h"
#include "hw/acpi/ghes.h"
#include "hw/acpi/viot.h"
+#include "hw/acpi/acpi-generic-initiator.h"
#define ARM_SPI_BASE 32
@@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
}
}
+ build_srat_generic_pci_initiator(table_data);
+
if (ms->nvdimms_state->is_enabled) {
nvdimm_build_srat(table_data);
}
diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
index db3ed02c80..6fdaf887cd 100644
--- a/include/hw/acpi/acpi-generic-initiator.h
+++ b/include/hw/acpi/acpi-generic-initiator.h
@@ -27,4 +27,30 @@ typedef struct AcpiGenericInitiatorClass {
ObjectClass parent_class;
} AcpiGenericInitiatorClass;
+/*
+ * ACPI 6.3:
+ * Table 5-81 Flags – Generic Initiator Affinity Structure
+ */
+typedef enum {
+ GEN_AFFINITY_ENABLED = (1 << 0), /*
+ * If clear, the OSPM ignores the contents
+ * of the Generic Initiator/Port Affinity
+ * Structure. This allows system firmware
+ * to populate the SRAT with a static
+ * number of structures, but only enable
+ * them as necessary.
+ */
+} GenericAffinityFlags;
+
+/*
+ * ACPI 6.3:
+ * Table 5-80 Device Handle - PCI
+ */
+typedef struct PCIDeviceHandle {
+ uint16_t segment;
+ uint16_t bdf;
+} PCIDeviceHandle;
+
+void build_srat_generic_pci_initiator(GArray *table_data);
+
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] qom: new object to associate device to numa node
2023-11-19 13:01 ` [PATCH v4 1/2] qom: new object to associate device to numa node ankita
@ 2023-11-27 22:57 ` Alex Williamson
2023-11-30 14:12 ` Markus Armbruster
1 sibling, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2023-11-27 22:57 UTC (permalink / raw)
To: ankita
Cc: jgg, clg, shannon.zhaosl, peter.maydell, ani, berrange, eduardo,
imammedo, mst, eblake, armbru, david, gshan, Jonathan.Cameron,
aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
udhoke, qemu-arm, qemu-devel
On Sun, 19 Nov 2023 18:31:10 +0530
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
> partitioning of the GPU device resources (including device memory) into
> several (upto 8) isolated instances. Each of the partitioned memory needs
> a dedicated NUMA node to operate. The partitions are not fixed and they
> can be created/deleted at runtime.
>
> Unfortunately Linux OS does not provide a means to dynamically create/destroy
> NUMA nodes and such feature implementation is not expected to be trivial. The
> nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
> we utilize the Generic Initiator Affinity structures that allows association
> between nodes and devices. Multiple GI structures per BDF is possible,
> allowing creation of multiple nodes by exposing unique PXM in each of these
> structures.
>
> Introduce a new acpi-generic-initiator object to allow host admin provide the
> device and the corresponding NUMA nodes. Qemu maintain this association and
> use this object to build the requisite GI Affinity Structure.
>
> An admin can provide the range of nodes through a uint16 array host-nodes
> and link it to a device by providing its id. Currently, only PCI device is
> supported and an error is returned for acpi device. The following sample
> creates 8 nodes and link them to the PCI device dev0:
>
> -numa node,nodeid=2 \
> -numa node,nodeid=3 \
> -numa node,nodeid=4 \
> -numa node,nodeid=5 \
> -numa node,nodeid=6 \
> -numa node,nodeid=7 \
> -numa node,nodeid=8 \
> -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
>
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> hw/acpi/acpi-generic-initiator.c | 84 ++++++++++++++++++++++++
> hw/acpi/meson.build | 1 +
> include/hw/acpi/acpi-generic-initiator.h | 30 +++++++++
> qapi/qom.json | 18 +++++
> 4 files changed, 133 insertions(+)
> create mode 100644 hw/acpi/acpi-generic-initiator.c
> create mode 100644 include/hw/acpi/acpi-generic-initiator.h
>
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> new file mode 100644
> index 0000000000..5ea51cb81e
> --- /dev/null
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-builtin-visit.h"
> +#include "qapi/visitor.h"
> +#include "qom/object_interfaces.h"
> +#include "qom/object.h"
> +#include "hw/qdev-core.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/pci.h"
There's nothing related to vfio here except for the example use case,
surely you don't need the above two headers.
> +#include "hw/pci/pci_device.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
> + ACPI_GENERIC_INITIATOR, OBJECT,
> + { TYPE_USER_CREATABLE },
> + { NULL })
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
> +
> +static void acpi_generic_initiator_init(Object *obj)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> + gi->device = NULL;
> + gi->nodelist = NULL;
> +}
> +
> +static void acpi_generic_initiator_finalize(Object *obj)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> + g_free(gi->device);
> + qapi_free_uint16List(gi->nodelist);
> +}
> +
> +static void acpi_generic_initiator_set_pci_device(Object *obj, const char *val,
> + Error **errp)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> + gi->device = g_strdup(val);
> +}
> +
> +static void acpi_generic_initiator_set_acpi_device(Object *obj, const char *val,
> + Error **errp)
> +{
> + error_setg(errp, "Generic Initiator ACPI device not supported");
> +}
> +
> +static void
> +acpi_generic_initiator_set_host_nodes(Object *obj, Visitor *v, const char *name,
> + void *opaque, Error **errp)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> + uint16List *l;
> +
> + visit_type_uint16List(v, name, &(gi->nodelist), errp);
> +
> + for (l = gi->nodelist; l; l = l->next) {
> + if (l->value >= MAX_NODES) {
> + error_setg(errp, "Invalid host-nodes value: %d", l->value);
> + qapi_free_uint16List(gi->nodelist);
> + return;
> + }
> + }
> +}
> +
> +static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> +{
> + object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP,
> + NULL, acpi_generic_initiator_set_pci_device);
> + object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP,
> + NULL, acpi_generic_initiator_set_acpi_device);
This doesn't allow introspection of acpi-dev support, the property
shouldn't be added until the support is available. At least we've
thought about it now and we might use a comment to describe the
intention that ACPI devices could be supported by this option in the
future.
> + object_class_property_add(oc, ACPI_GENERIC_INITIATOR_HOSTNODE_PROP, "int",
> + NULL,
> + acpi_generic_initiator_set_host_nodes,
> + NULL, NULL);
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index fc1b952379..2268589519 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -1,5 +1,6 @@
> acpi_ss = ss.source_set()
> acpi_ss.add(files(
> + 'acpi-generic-initiator.c',
> 'acpi_interface.c',
> 'aml-build.c',
> 'bios-linker-loader.c',
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> new file mode 100644
> index 0000000000..db3ed02c80
> --- /dev/null
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -0,0 +1,30 @@
> +#ifndef ACPI_GENERIC_INITIATOR_H
> +#define ACPI_GENERIC_INITIATOR_H
> +
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "qemu/uuid.h"
> +#include "hw/acpi/aml-build.h"
> +#include "qom/object.h"
> +#include "qom/object_interfaces.h"
> +
> +#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
> +
> +#define ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP "pci-dev"
> +#define ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP "acpi-dev"
> +#define ACPI_GENERIC_INITIATOR_HOSTNODE_PROP "host-nodes"
> +
> +typedef struct AcpiGenericInitiator {
> + /* private */
> + Object parent;
> +
> + /* public */
> + char *device;
> + uint16List *nodelist;
> +} AcpiGenericInitiator;
> +
> +typedef struct AcpiGenericInitiatorClass {
> + ObjectClass parent_class;
> +} AcpiGenericInitiatorClass;
> +
> +#endif
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..f726f5ea41 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,22 @@
> { 'struct': 'VfioUserServerProperties',
> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>
> +##
> +# @AcpiGenericInitiatorProperties:
> +#
> +# Properties for acpi-generic-initiator objects.
> +#
> +# @pci-dev: PCI device ID to be associated with the node
> +#
> +# @acpi-dev: ACPI device ID to be associated with the node
> +#
> +# @host-nodes: numa node list
> +#
> +# Since: 8.2
8.2 is in freeze, this is 9.0 material. Thanks,
Alex
> +##
> +{ 'struct': 'AcpiGenericInitiatorProperties',
> + 'data': { '*pci-dev': 'str', '*acpi-dev': 'str', 'host-nodes': ['uint16'] } }
> +
> ##
> # @RngProperties:
> #
> @@ -911,6 +927,7 @@
> ##
> { 'enum': 'ObjectType',
> 'data': [
> + 'acpi-generic-initiator',
> 'authz-list',
> 'authz-listfile',
> 'authz-pam',
> @@ -981,6 +998,7 @@
> 'id': 'str' },
> 'discriminator': 'qom-type',
> 'data': {
> + 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties',
> 'authz-list': 'AuthZListProperties',
> 'authz-listfile': 'AuthZListFileProperties',
> 'authz-pam': 'AuthZPAMProperties',
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure
2023-11-19 13:01 ` [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
@ 2023-11-27 22:57 ` Alex Williamson
2023-11-30 14:12 ` Markus Armbruster
1 sibling, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2023-11-27 22:57 UTC (permalink / raw)
To: ankita
Cc: jgg, clg, shannon.zhaosl, peter.maydell, ani, berrange, eduardo,
imammedo, mst, eblake, armbru, david, gshan, Jonathan.Cameron,
aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
udhoke, qemu-arm, qemu-devel
On Sun, 19 Nov 2023 18:31:11 +0530
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> ACPI spec provides a scheme to associate "Generic Initiators" [1]
> (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines GPUs) with Proximity Domains. This is
> achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
> Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
> node for each unique PXM ID encountered. Qemu currently do not implement
> these structures while building SRAT.
>
> Add GI structures while building VM ACPI SRAT. The association between
> devices and nodes are stored using acpi-generic-initiator object. Lookup
> presence of all such objects and use them to build these structures.
>
> The structure needs a PCI device handle [2] that consists of the device BDF.
> The vfio-pci device corresponding to the acpi-generic-initiator object is
> located to determine the BDF.
>
> [1] ACPI Spec 6.3, Section 5.2.16.6
> [2] ACPI Spec 6.3, Table 5.80
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> hw/acpi/acpi-generic-initiator.c | 100 +++++++++++++++++++++++
> hw/arm/virt-acpi-build.c | 3 +
> include/hw/acpi/acpi-generic-initiator.h | 26 ++++++
> 3 files changed, 129 insertions(+)
>
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> index 5ea51cb81e..a9222438ec 100644
> --- a/hw/acpi/acpi-generic-initiator.c
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -16,6 +16,7 @@
> #include "hw/pci/pci_device.h"
> #include "sysemu/numa.h"
> #include "hw/acpi/acpi-generic-initiator.h"
> +#include "qemu/error-report.h"
>
> OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
> ACPI_GENERIC_INITIATOR, OBJECT,
> @@ -82,3 +83,102 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> acpi_generic_initiator_set_host_nodes,
> NULL, NULL);
> }
> +
> +static int acpi_generic_initiator_list(Object *obj, void *opaque)
> +{
> + GSList **list = opaque;
> +
> + if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
> + *list = g_slist_append(*list, ACPI_GENERIC_INITIATOR(obj));
> + }
> +
> + object_child_foreach(obj, acpi_generic_initiator_list, opaque);
> + return 0;
> +}
> +
> +/*
> + * Identify Generic Initiator objects and link them into the list which is
> + * returned to the caller.
> + *
> + * Note: it is the caller's responsibility to free the list to avoid
> + * memory leak.
> + */
> +static GSList *acpi_generic_initiator_get_list(void)
> +{
> + GSList *list = NULL;
> +
> + object_child_foreach(object_get_root(), acpi_generic_initiator_list, &list);
> + return list;
> +}
> +
> +/*
> + * ACPI 6.3:
> + * Table 5-78 Generic Initiator Affinity Structure
> + */
> +static
> +void build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
> + PCIDeviceHandle *handle)
> +{
> + uint8_t index;
> +
> + build_append_int_noprefix(table_data, 5, 1); /* Type */
> + build_append_int_noprefix(table_data, 32, 1); /* Length */
> + build_append_int_noprefix(table_data, 0, 1); /* Reserved */
> + build_append_int_noprefix(table_data, 1, 1); /* Device Handle Type: PCI */
> + build_append_int_noprefix(table_data, node, 4); /* Proximity Domain */
> +
> + /* Device Handle - PCI */
> + build_append_int_noprefix(table_data, handle->segment, 2);
> + build_append_int_noprefix(table_data, handle->bdf, 2);
> + for (index = 0; index < 12; index++) {
> + build_append_int_noprefix(table_data, 0, 1);
> + }
> +
> + build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
> + build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> +}
> +
> +void build_srat_generic_pci_initiator(GArray *table_data)
> +{
> + GSList *gi_list, *list = acpi_generic_initiator_get_list();
> + AcpiGenericInitiator *gi;
> +
> + for (gi_list = list; gi_list; gi_list = gi_list->next) {
> + Object *o;
> + uint16List *l;
> + PCIDevice *pci_dev;
> + bool node_specified = false;
> +
> + gi = gi_list->data;
> +
> + /* User fails to provide a device. */
> + g_assert(gi->device);
> +
> + o = object_resolve_path_type(gi->device, TYPE_PCI_DEVICE, NULL);
> + if (!o) {
> + error_printf("Specified device must be a PCI device.\n");
> + g_assert(o);
> + }
> + pci_dev = PCI_DEVICE(o);
> +
> + for (l = gi->nodelist; l; l = l->next) {
> + PCIDeviceHandle dev_handle;
> + dev_handle.segment = 0;
> + dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> + pci_dev->devfn);
> + build_srat_generic_pci_initiator_affinity(table_data,
> + l->value, &dev_handle);
> + node_specified = true;
> + }
> +
> + if (!node_specified) {
> + error_report("Generic Initiator device 0:%x:%x.%x has no associated"
> + " NUMA node.", pci_bus_num(pci_get_bus(pci_dev)),
> + PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> + error_printf("Specify NUMA node with -nodelist option.\n");
No such option, -nodelist?
> + g_assert(node_specified);
I won't claim expertise in QEMU error handling, but an assert is a
pretty harsh way to handle failures. Thanks,
Alex
> + }
> + }
> +
> + g_slist_free(list);
> +}
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 8bc35a483c..00d77327e0 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -58,6 +58,7 @@
> #include "migration/vmstate.h"
> #include "hw/acpi/ghes.h"
> #include "hw/acpi/viot.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
>
> #define ARM_SPI_BASE 32
>
> @@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> }
> }
>
> + build_srat_generic_pci_initiator(table_data);
> +
> if (ms->nvdimms_state->is_enabled) {
> nvdimm_build_srat(table_data);
> }
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> index db3ed02c80..6fdaf887cd 100644
> --- a/include/hw/acpi/acpi-generic-initiator.h
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -27,4 +27,30 @@ typedef struct AcpiGenericInitiatorClass {
> ObjectClass parent_class;
> } AcpiGenericInitiatorClass;
>
> +/*
> + * ACPI 6.3:
> + * Table 5-81 Flags – Generic Initiator Affinity Structure
> + */
> +typedef enum {
> + GEN_AFFINITY_ENABLED = (1 << 0), /*
> + * If clear, the OSPM ignores the contents
> + * of the Generic Initiator/Port Affinity
> + * Structure. This allows system firmware
> + * to populate the SRAT with a static
> + * number of structures, but only enable
> + * them as necessary.
> + */
> +} GenericAffinityFlags;
> +
> +/*
> + * ACPI 6.3:
> + * Table 5-80 Device Handle - PCI
> + */
> +typedef struct PCIDeviceHandle {
> + uint16_t segment;
> + uint16_t bdf;
> +} PCIDeviceHandle;
> +
> +void build_srat_generic_pci_initiator(GArray *table_data);
> +
> #endif
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 1/2] qom: new object to associate device to numa node
2023-11-19 13:01 ` [PATCH v4 1/2] qom: new object to associate device to numa node ankita
2023-11-27 22:57 ` Alex Williamson
@ 2023-11-30 14:12 ` Markus Armbruster
1 sibling, 0 replies; 7+ messages in thread
From: Markus Armbruster @ 2023-11-30 14:12 UTC (permalink / raw)
To: ankita
Cc: jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell, ani,
berrange, eduardo, imammedo, mst, eblake, armbru, david, gshan,
Jonathan.Cameron, aniketa, cjia, kwankhede, targupta, vsethi,
acurrid, dnigam, udhoke, qemu-arm, qemu-devel
<ankita@nvidia.com> writes:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
> partitioning of the GPU device resources (including device memory) into
> several (upto 8) isolated instances. Each of the partitioned memory needs
> a dedicated NUMA node to operate. The partitions are not fixed and they
> can be created/deleted at runtime.
>
> Unfortunately Linux OS does not provide a means to dynamically create/destroy
> NUMA nodes and such feature implementation is not expected to be trivial. The
> nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
> we utilize the Generic Initiator Affinity structures that allows association
> between nodes and devices. Multiple GI structures per BDF is possible,
> allowing creation of multiple nodes by exposing unique PXM in each of these
> structures.
>
> Introduce a new acpi-generic-initiator object to allow host admin provide the
> device and the corresponding NUMA nodes. Qemu maintain this association and
> use this object to build the requisite GI Affinity Structure.
>
> An admin can provide the range of nodes through a uint16 array host-nodes
> and link it to a device by providing its id. Currently, only PCI device is
> supported and an error is returned for acpi device. The following sample
> creates 8 nodes and link them to the PCI device dev0:
>
> -numa node,nodeid=2 \
> -numa node,nodeid=3 \
> -numa node,nodeid=4 \
> -numa node,nodeid=5 \
> -numa node,nodeid=6 \
> -numa node,nodeid=7 \
> -numa node,nodeid=8 \
> -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
>
> [1] https://www.nvidia.com/en-in/technologies/multi-instance-gpu
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> hw/acpi/acpi-generic-initiator.c | 84 ++++++++++++++++++++++++
> hw/acpi/meson.build | 1 +
> include/hw/acpi/acpi-generic-initiator.h | 30 +++++++++
> qapi/qom.json | 18 +++++
> 4 files changed, 133 insertions(+)
> create mode 100644 hw/acpi/acpi-generic-initiator.c
> create mode 100644 include/hw/acpi/acpi-generic-initiator.h
>
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> new file mode 100644
> index 0000000000..5ea51cb81e
> --- /dev/null
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/qdev-properties.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-builtin-visit.h"
> +#include "qapi/visitor.h"
> +#include "qom/object_interfaces.h"
> +#include "qom/object.h"
> +#include "hw/qdev-core.h"
> +#include "hw/vfio/vfio-common.h"
> +#include "hw/vfio/pci.h"
> +#include "hw/pci/pci_device.h"
> +#include "sysemu/numa.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
Several #include are superfluous. This compiles for me:
#include "qemu/osdep.h"
#include "hw/acpi/acpi-generic-initiator.h"
#include "hw/pci/pci_device.h"
#include "qapi/error.h"
#include "qapi/qapi-builtin-visit.h"
#include "qapi/visitor.h"
#include "qemu/error-report.h"
#include "sysemu/numa.h"
Yes, the alphabetical order is intentional.
> +
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
> + ACPI_GENERIC_INITIATOR, OBJECT,
> + { TYPE_USER_CREATABLE },
> + { NULL })
> +
> +OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
> +
> +static void acpi_generic_initiator_init(Object *obj)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> + gi->device = NULL;
> + gi->nodelist = NULL;
> +}
> +
> +static void acpi_generic_initiator_finalize(Object *obj)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> + g_free(gi->device);
> + qapi_free_uint16List(gi->nodelist);
> +}
> +
> +static void acpi_generic_initiator_set_pci_device(Object *obj, const char *val,
> + Error **errp)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> +
> + gi->device = g_strdup(val);
The property is named "pci-dev", but the C member is called @device.
Making developers remember this mapping is not nice. Suggest to rename
to @pci_dev.
> +}
> +
> +static void acpi_generic_initiator_set_acpi_device(Object *obj, const char *val,
> + Error **errp)
> +{
> + error_setg(errp, "Generic Initiator ACPI device not supported");
> +}
Let's add the property when it actually works. More below at [*].
> +
> +static void
> +acpi_generic_initiator_set_host_nodes(Object *obj, Visitor *v, const char *name,
> + void *opaque, Error **errp)
> +{
> + AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
> + uint16List *l;
> +
> + visit_type_uint16List(v, name, &(gi->nodelist), errp);
> +
> + for (l = gi->nodelist; l; l = l->next) {
> + if (l->value >= MAX_NODES) {
> + error_setg(errp, "Invalid host-nodes value: %d", l->value);
> + qapi_free_uint16List(gi->nodelist);
> + return;
> + }
> + }
Why not store the nodes in a bitset, like
host_memory_backend_set_host_nodes() does?
> +}
> +
> +static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> +{
> + object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP,
> + NULL, acpi_generic_initiator_set_pci_device);
> + object_class_property_add_str(oc, ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP,
> + NULL, acpi_generic_initiator_set_acpi_device);
[*] Drop this one.
> + object_class_property_add(oc, ACPI_GENERIC_INITIATOR_HOSTNODE_PROP, "int",
> + NULL,
> + acpi_generic_initiator_set_host_nodes,
> + NULL, NULL);
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index fc1b952379..2268589519 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -1,5 +1,6 @@
> acpi_ss = ss.source_set()
> acpi_ss.add(files(
> + 'acpi-generic-initiator.c',
> 'acpi_interface.c',
> 'aml-build.c',
> 'bios-linker-loader.c',
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> new file mode 100644
> index 0000000000..db3ed02c80
> --- /dev/null
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -0,0 +1,30 @@
> +#ifndef ACPI_GENERIC_INITIATOR_H
> +#define ACPI_GENERIC_INITIATOR_H
> +
> +#include "hw/mem/pc-dimm.h"
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "qemu/uuid.h"
> +#include "hw/acpi/aml-build.h"
> +#include "qom/object.h"
> +#include "qom/object_interfaces.h"
> +
> +#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
> +
> +#define ACPI_GENERIC_INITIATOR_PCI_DEVICE_PROP "pci-dev"
> +#define ACPI_GENERIC_INITIATOR_ACPI_DEVICE_PROP "acpi-dev"
> +#define ACPI_GENERIC_INITIATOR_HOSTNODE_PROP "host-nodes"
These three macros have exactly one use each. Get rid of them, please.
> +
> +typedef struct AcpiGenericInitiator {
> + /* private */
> + Object parent;
> +
> + /* public */
> + char *device;
> + uint16List *nodelist;
> +} AcpiGenericInitiator;
> +
> +typedef struct AcpiGenericInitiatorClass {
> + ObjectClass parent_class;
> +} AcpiGenericInitiatorClass;
> +
> +#endif
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..f726f5ea41 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,22 @@
> { 'struct': 'VfioUserServerProperties',
> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>
> +##
> +# @AcpiGenericInitiatorProperties:
> +#
> +# Properties for acpi-generic-initiator objects.
> +#
> +# @pci-dev: PCI device ID to be associated with the node
> +#
> +# @acpi-dev: ACPI device ID to be associated with the node
[*] Drop this one.
> +#
> +# @host-nodes: numa node list
> +#
> +# Since: 8.2
9.0
> +##
> +{ 'struct': 'AcpiGenericInitiatorProperties',
> + 'data': { '*pci-dev': 'str', '*acpi-dev': 'str', 'host-nodes': ['uint16'] } }
Long line. Better:
'data': { '*pci-dev': 'str',
'*acpi-dev': 'str',
'host-nodes': ['uint16'] } }
> +
> ##
> # @RngProperties:
> #
> @@ -911,6 +927,7 @@
> ##
> { 'enum': 'ObjectType',
> 'data': [
> + 'acpi-generic-initiator',
> 'authz-list',
> 'authz-listfile',
> 'authz-pam',
> @@ -981,6 +998,7 @@
> 'id': 'str' },
> 'discriminator': 'qom-type',
> 'data': {
> + 'acpi-generic-initiator': 'AcpiGenericInitiatorProperties',
> 'authz-list': 'AuthZListProperties',
> 'authz-listfile': 'AuthZListFileProperties',
> 'authz-pam': 'AuthZPAMProperties',
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure
2023-11-19 13:01 ` [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2023-11-27 22:57 ` Alex Williamson
@ 2023-11-30 14:12 ` Markus Armbruster
1 sibling, 0 replies; 7+ messages in thread
From: Markus Armbruster @ 2023-11-30 14:12 UTC (permalink / raw)
To: ankita
Cc: jgg, alex.williamson, clg, shannon.zhaosl, peter.maydell, ani,
berrange, eduardo, imammedo, mst, eblake, david, gshan,
Jonathan.Cameron, aniketa, cjia, kwankhede, targupta, vsethi,
acurrid, dnigam, udhoke, qemu-arm, qemu-devel
<ankita@nvidia.com> writes:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> ACPI spec provides a scheme to associate "Generic Initiators" [1]
> (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines GPUs) with Proximity Domains. This is
> achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
> Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
> node for each unique PXM ID encountered. Qemu currently do not implement
> these structures while building SRAT.
>
> Add GI structures while building VM ACPI SRAT. The association between
> devices and nodes are stored using acpi-generic-initiator object. Lookup
> presence of all such objects and use them to build these structures.
>
> The structure needs a PCI device handle [2] that consists of the device BDF.
> The vfio-pci device corresponding to the acpi-generic-initiator object is
> located to determine the BDF.
>
> [1] ACPI Spec 6.3, Section 5.2.16.6
> [2] ACPI Spec 6.3, Table 5.80
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> hw/acpi/acpi-generic-initiator.c | 100 +++++++++++++++++++++++
> hw/arm/virt-acpi-build.c | 3 +
> include/hw/acpi/acpi-generic-initiator.h | 26 ++++++
> 3 files changed, 129 insertions(+)
>
> diff --git a/hw/acpi/acpi-generic-initiator.c b/hw/acpi/acpi-generic-initiator.c
> index 5ea51cb81e..a9222438ec 100644
> --- a/hw/acpi/acpi-generic-initiator.c
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -16,6 +16,7 @@
> #include "hw/pci/pci_device.h"
> #include "sysemu/numa.h"
> #include "hw/acpi/acpi-generic-initiator.h"
> +#include "qemu/error-report.h"
>
> OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
> ACPI_GENERIC_INITIATOR, OBJECT,
> @@ -82,3 +83,102 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
> acpi_generic_initiator_set_host_nodes,
> NULL, NULL);
> }
> +
> +static int acpi_generic_initiator_list(Object *obj, void *opaque)
> +{
> + GSList **list = opaque;
> +
> + if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
> + *list = g_slist_append(*list, ACPI_GENERIC_INITIATOR(obj));
> + }
> +
> + object_child_foreach(obj, acpi_generic_initiator_list, opaque);
> + return 0;
> +}
> +
> +/*
> + * Identify Generic Initiator objects and link them into the list which is
> + * returned to the caller.
> + *
> + * Note: it is the caller's responsibility to free the list to avoid
> + * memory leak.
> + */
> +static GSList *acpi_generic_initiator_get_list(void)
> +{
> + GSList *list = NULL;
> +
> + object_child_foreach(object_get_root(), acpi_generic_initiator_list, &list);
Long line.
> + return list;
> +}
> +
> +/*
> + * ACPI 6.3:
> + * Table 5-78 Generic Initiator Affinity Structure
> + */
> +static
> +void build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
> + PCIDeviceHandle *handle)
Style nitpick: bad line break.
We traditionally format like
static void build_srat_generic_pci_initiator_affinity(GArray *table_data,
int node,
PCIDeviceHandle *handle)
or, to avoid the long line
static void build_srat_generic_pci_initiator_affinity(GArray *table_data,
int node, PCIDeviceHandle *handle)
but there's also precedence for
static void
build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
PCIDeviceHandle *handle)
> +{
> + uint8_t index;
> +
> + build_append_int_noprefix(table_data, 5, 1); /* Type */
> + build_append_int_noprefix(table_data, 32, 1); /* Length */
> + build_append_int_noprefix(table_data, 0, 1); /* Reserved */
> + build_append_int_noprefix(table_data, 1, 1); /* Device Handle Type: PCI */
> + build_append_int_noprefix(table_data, node, 4); /* Proximity Domain */
> +
> + /* Device Handle - PCI */
> + build_append_int_noprefix(table_data, handle->segment, 2);
> + build_append_int_noprefix(table_data, handle->bdf, 2);
> + for (index = 0; index < 12; index++) {
> + build_append_int_noprefix(table_data, 0, 1);
> + }
> +
> + build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
> + build_append_int_noprefix(table_data, 0, 4); /* Reserved */
> +}
> +
> +void build_srat_generic_pci_initiator(GArray *table_data)
> +{
> + GSList *gi_list, *list = acpi_generic_initiator_get_list();
> + AcpiGenericInitiator *gi;
> +
> + for (gi_list = list; gi_list; gi_list = gi_list->next) {
> + Object *o;
> + uint16List *l;
> + PCIDevice *pci_dev;
> + bool node_specified = false;
> +
> + gi = gi_list->data;
> +
> + /* User fails to provide a device. */
> + g_assert(gi->device);
Assertions are for programming errors, not for diagnosing or reporting
user errors. Instead
if (!gi->device) {
error_report(...);
exit(1);
}
This assumes the function can only ever run duting initial startup. If
that's not ensured, exit(1) is wrong, and you need to return failure
instead, so the callers can do the right thing.
> +
> + o = object_resolve_path_type(gi->device, TYPE_PCI_DEVICE, NULL);
> + if (!o) {
> + error_printf("Specified device must be a PCI device.\n");
> + g_assert(o);
Likewise.
> + }
> + pci_dev = PCI_DEVICE(o);
> +
> + for (l = gi->nodelist; l; l = l->next) {
> + PCIDeviceHandle dev_handle;
> + dev_handle.segment = 0;
> + dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> + pci_dev->devfn);
> + build_srat_generic_pci_initiator_affinity(table_data,
> + l->value, &dev_handle);
> + node_specified = true;
> + }
> +
> + if (!node_specified) {
> + error_report("Generic Initiator device 0:%x:%x.%x has no associated"
> + " NUMA node.", pci_bus_num(pci_get_bus(pci_dev)),
> + PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev->devfn));
> + error_printf("Specify NUMA node with -nodelist option.\n");
> + g_assert(node_specified);
Likewise.
> + }
> + }
> +
> + g_slist_free(list);
> +}
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 8bc35a483c..00d77327e0 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -58,6 +58,7 @@
> #include "migration/vmstate.h"
> #include "hw/acpi/ghes.h"
> #include "hw/acpi/viot.h"
> +#include "hw/acpi/acpi-generic-initiator.h"
>
> #define ARM_SPI_BASE 32
>
> @@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> }
> }
>
> + build_srat_generic_pci_initiator(table_data);
> +
> if (ms->nvdimms_state->is_enabled) {
> nvdimm_build_srat(table_data);
> }
> diff --git a/include/hw/acpi/acpi-generic-initiator.h b/include/hw/acpi/acpi-generic-initiator.h
> index db3ed02c80..6fdaf887cd 100644
> --- a/include/hw/acpi/acpi-generic-initiator.h
> +++ b/include/hw/acpi/acpi-generic-initiator.h
> @@ -27,4 +27,30 @@ typedef struct AcpiGenericInitiatorClass {
> ObjectClass parent_class;
> } AcpiGenericInitiatorClass;
>
> +/*
> + * ACPI 6.3:
> + * Table 5-81 Flags – Generic Initiator Affinity Structure
> + */
> +typedef enum {
> + GEN_AFFINITY_ENABLED = (1 << 0), /*
> + * If clear, the OSPM ignores the contents
> + * of the Generic Initiator/Port Affinity
> + * Structure. This allows system firmware
> + * to populate the SRAT with a static
> + * number of structures, but only enable
> + * them as necessary.
> + */
> +} GenericAffinityFlags;
> +
> +/*
> + * ACPI 6.3:
> + * Table 5-80 Device Handle - PCI
> + */
> +typedef struct PCIDeviceHandle {
> + uint16_t segment;
> + uint16_t bdf;
> +} PCIDeviceHandle;
> +
> +void build_srat_generic_pci_initiator(GArray *table_data);
> +
> #endif
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-11-30 14:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-19 13:01 [PATCH v4 0/2] acpi: report numa nodes for device memory using GI ankita
2023-11-19 13:01 ` [PATCH v4 1/2] qom: new object to associate device to numa node ankita
2023-11-27 22:57 ` Alex Williamson
2023-11-30 14:12 ` Markus Armbruster
2023-11-19 13:01 ` [PATCH v4 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2023-11-27 22:57 ` Alex Williamson
2023-11-30 14:12 ` Markus Armbruster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).