[PATCH v8 0/2] acpi: report NUMA nodes for device memory using GI

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 0/2] acpi: report NUMA nodes for device memory using GI
@ 2024-03-06 12:33 ankita
  2024-03-06 12:33 ` [PATCH v8 1/2] qom: new object to associate device to NUMA node ankita
  2024-03-06 12:33 ` [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
  0 siblings, 2 replies; 8+ messages in thread
From: ankita @ 2024-03-06 12:33 UTC (permalink / raw)
  To: ankita, jgg, marcel.apfelbaum, philmd, wangyanan55,
	alex.williamson, pbonzini, clg, shannon.zhaosl, peter.maydell,
	ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
	gshan, Jonathan.Cameron
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

From: Ankit Agrawal <ankita@nvidia.com>

There are upcoming devices which allow CPU to cache coherently access
their memory. It is sensible to expose such memory as NUMA nodes separate
from the sysmem node to the OS. The ACPI spec provides a scheme in SRAT
called Generic Initiator Affinity Structure [1] to allow an association
between a Proximity Domain (PXM) and a Generic Initiator (GI) (e.g.
heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines).

While a single node per device may cover several use cases, it is however
insufficient for a full utilization of the NVIDIA GPUs MIG
(Mult-Instance GPUs) [2] feature. The feature allows partitioning of the
GPU device resources (including device memory) into several (upto 8)
isolated instances. Each of the partitioned memory requires a dedicated NUMA
node to operate. The partitions are not fixed and they can be created/deleted
at runtime.

Linux OS does not provide a means to dynamically create/destroy NUMA nodes
and such feature implementation is expected to be non-trivial. The nodes
that OS discovers at the boot time while parsing SRAT remains fixed. So we
utilize the GI Affinity structures that allows association between nodes
and devices. Multiple GI structures per device/BDF is possible, allowing
creation of multiple nodes in the VM by exposing unique PXM in each of these
structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.

When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.

Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201   1]                       Length : 20

[0CAh 0202   1]                    Reserved1 : 00
[0CBh 0203   1]           Device Handle Type : 01
[0CCh 0204   4]             Proximity Domain : 00000007
[0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
[0E4h 0228   4]                    Reserved2 : 00000000

[0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233   1]                       Length : 20

On Grace Hopper systems, an admin will create a range of 8 nodes and associate
them with the device using the acpi-generic-initiator object. While a
configuration of less than 8 nodes per device is allowed, such configuration
will prevent utilization of the feature to the fullest. This setting is
applicable to all the Grace+Hopper systems. The following is an example of
the Qemu command line arguments to create 8 nodes and link them to the device
'dev0':

-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \

The performance benefits can be realized by providing the NUMA node distances
appropriately (through libvirt tags or Qemu params). The admin can get the
distance among nodes in hardware using `numactl -H`.

This series goes along with the recenty added vfio-pci variant driver [3].

Applied over v8.2.2
base commit: 11aa0b1ff115b86160c4d37e7c37e6a6b13b77ea

[1] ACPI Spec 6.3, Section 5.2.16.6
Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [2]
Link: https://lore.kernel.org/all/20240220115055.23546-1-ankita@nvidia.com/ [3]

Link for v7:
Link: https://lore.kernel.org/all/20240223124223.800078-1-ankita@nvidia.com/

v7 -> v8
- Replaced the code to collect the acpi-generic-initiator objects
  with the code to use recursive helper object_child_foreach_recursive
  based on suggestion from Jonathan Cameron.
- Added sanity check for the node id passed to the
  acpi-generic-initiator object.
- Added change to use GI as HMAT initiator as per Jonathan's suggestion.
- Fixed nits pointed by Marcus and Jonathan.
- Collected Marcus' Acked-by.
- Rebased to v8.2.2.

v6 -> v7
- Updated code and the commit message to make acpi-generic-initiator
  define a 1:1 relationship between device and node based on
  Jonathan Cameron's suggestion.
- Updated commit message to include the decoded GI entry in the SRAT.
- Rebased to v8.2.1.

v5 -> v6
- Updated commit message for the [1/2] and the cover letter.
- Updated the acpi-generic-initiator object comment description for
  clarity on the input host-nodes.
- Rebased to v8.2.0-rc4.

v4 -> v5
- Removed acpi-dev option until full support.
- The NUMA nodes are saved as bitmap instead of uint16List.
- Replaced asserts to exit calls.
- Addressed other miscellaneous comments.

v3 -> v4
- changed the ':' delimited way to a uint16 array to communicate the
nodes associated with the device.
- added asserts to handle invalid inputs.
- addressed other miscellaneous v3 comments.

v2 -> v3
- changed param to accept a ':' delimited list of NUMA nodes, instead
of a range.
- Removed nvidia-acpi-generic-initiator object.
- Addressed miscellaneous comments in v2.

v1 -> v2
- Removed dependency on sysfs to communicate the feature with variant module.
- Use GI Affinity SRAT structure instead of Memory Affinity.
- No DSDT entries needed to communicate the PXM for the device. SRAT GI
structure is used instead.
- New objects introduced to establish link between device and nodes.

Ankit Agrawal (2):
  qom: new object to associate device to NUMA node
  hw/acpi: Implement the SRAT GI affinity structure

 hw/acpi/acpi_generic_initiator.c         | 146 +++++++++++++++++++++++
 hw/acpi/hmat.c                           |   2 +-
 hw/acpi/meson.build                      |   1 +
 hw/arm/virt-acpi-build.c                 |   3 +
 hw/core/numa.c                           |   3 +-
 include/hw/acpi/acpi_generic_initiator.h |  57 +++++++++
 include/sysemu/numa.h                    |   1 +
 qapi/qom.json                            |  17 +++
 8 files changed, 228 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/acpi_generic_initiator.c
 create mode 100644 include/hw/acpi/acpi_generic_initiator.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v8 1/2] qom: new object to associate device to NUMA node
  2024-03-06 12:33 [PATCH v8 0/2] acpi: report NUMA nodes for device memory using GI ankita
@ 2024-03-06 12:33 ` ankita
  2024-03-06 13:49   ` Jonathan Cameron via
  2024-03-06 12:33 ` [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
  1 sibling, 1 reply; 8+ messages in thread
From: ankita @ 2024-03-06 12:33 UTC (permalink / raw)
  To: ankita, jgg, marcel.apfelbaum, philmd, wangyanan55,
	alex.williamson, pbonzini, clg, shannon.zhaosl, peter.maydell,
	ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
	gshan, Jonathan.Cameron
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

From: Ankit Agrawal <ankita@nvidia.com>

NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.

Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator Affinity structures that allows association
between nodes and devices. Multiple GI structures per BDF is possible,
allowing creation of multiple nodes by exposing unique PXM in each of these
structures.

Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.

When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.

Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201   1]                       Length : 20

[0CAh 0202   1]                    Reserved1 : 00
[0CBh 0203   1]           Device Handle Type : 01
[0CCh 0204   4]             Proximity Domain : 00000007
[0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224   4]        Flags (decoded below) : 00000001
                                     Enabled : 1
[0E4h 0228   4]                    Reserved2 : 00000000

[0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233   1]                       Length : 20

An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.

For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:

-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \

-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \

The performance benefits can be realized by providing the NUMA node distances
appropriately (through libvirt tags or Qemu params). The admin can get the
distance among nodes in hardware using `numactl -H`.

Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 hw/acpi/acpi_generic_initiator.c         | 73 ++++++++++++++++++++++++
 hw/acpi/hmat.c                           |  2 +-
 hw/acpi/meson.build                      |  1 +
 hw/core/numa.c                           |  3 +-
 include/hw/acpi/acpi_generic_initiator.h | 32 +++++++++++
 include/sysemu/numa.h                    |  1 +
 qapi/qom.json                            | 17 ++++++
 7 files changed, 127 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/acpi_generic_initiator.c
 create mode 100644 include/hw/acpi/acpi_generic_initiator.h

diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
new file mode 100644
index 0000000000..f57b3c8984
--- /dev/null
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/acpi_generic_initiator.h"
+#include "hw/boards.h"
+#include "hw/pci/pci_device.h"
+#include "qapi/error.h"
+#include "qapi/qapi-builtin-visit.h"
+#include "qapi/visitor.h"
+#include "qemu/error-report.h"
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(AcpiGenericInitiator, acpi_generic_initiator,
+                   ACPI_GENERIC_INITIATOR, OBJECT,
+                   { TYPE_USER_CREATABLE },
+                   { NULL })
+
+OBJECT_DECLARE_SIMPLE_TYPE(AcpiGenericInitiator, ACPI_GENERIC_INITIATOR)
+
+static void acpi_generic_initiator_init(Object *obj)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+    gi->node = MAX_NODES;
+    gi->pci_dev = NULL;
+}
+
+static void acpi_generic_initiator_finalize(Object *obj)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+    g_free(gi->pci_dev);
+}
+
+static void acpi_generic_initiator_set_pci_device(Object *obj, const char *val,
+                                                  Error **errp)
+{
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+
+    gi->pci_dev = g_strdup(val);
+}
+
+static void acpi_generic_initiator_set_node(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    AcpiGenericInitiator *gi = ACPI_GENERIC_INITIATOR(obj);
+    uint32_t value;
+
+    if (!visit_type_uint32(v, name, &value, errp)) {
+        return;
+    }
+
+    if (value >= MAX_NODES) {
+        error_printf("%s: Invalid NUMA node specified\n",
+                     TYPE_ACPI_GENERIC_INITIATOR);
+        exit(1);
+    }
+
+    gi->node = value;
+    ms->numa_state->nodes[gi->node].has_gi = true;
+}
+
+static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
+{
+    object_class_property_add_str(oc, "pci-dev", NULL,
+        acpi_generic_initiator_set_pci_device);
+    object_class_property_add(oc, "node", "int", NULL,
+        acpi_generic_initiator_set_node, NULL, NULL);
+}
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 3042d223c8..2242981e18 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -214,7 +214,7 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
     }
 
     for (i = 0; i < numa_state->num_nodes; i++) {
-        if (numa_state->nodes[i].has_cpu) {
+        if (numa_state->nodes[i].has_cpu || numa_state->nodes[i].has_gi) {
             initiator_list[num_initiator++] = i;
         }
     }
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fc1b952379..1424046164 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -1,5 +1,6 @@
 acpi_ss = ss.source_set()
 acpi_ss.add(files(
+  'acpi_generic_initiator.c',
   'acpi_interface.c',
   'aml-build.c',
   'bios-linker-loader.c',
diff --git a/hw/core/numa.c b/hw/core/numa.c
index f08956ddb0..58a32f1564 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -229,7 +229,8 @@ void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
                    node->target, numa_state->num_nodes);
         return;
     }
-    if (!numa_info[node->initiator].has_cpu) {
+    if (!numa_info[node->initiator].has_cpu &&
+        !numa_info[node->initiator].has_gi) {
         error_setg(errp, "Invalid initiator=%d, it isn't an "
                    "initiator proximity domain", node->initiator);
         return;
diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
new file mode 100644
index 0000000000..23d0b591c6
--- /dev/null
+++ b/include/hw/acpi/acpi_generic_initiator.h
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#ifndef ACPI_GENERIC_INITIATOR_H
+#define ACPI_GENERIC_INITIATOR_H
+
+#include "hw/mem/pc-dimm.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/aml-build.h"
+#include "sysemu/numa.h"
+#include "qemu/uuid.h"
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
+
+typedef struct AcpiGenericInitiator {
+    /* private */
+    Object parent;
+
+    /* public */
+    char *pci_dev;
+    uint16_t node;
+} AcpiGenericInitiator;
+
+typedef struct AcpiGenericInitiatorClass {
+    ObjectClass parent_class;
+} AcpiGenericInitiatorClass;
+
+#endif
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 4173ef2afa..825cfe86bc 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -41,6 +41,7 @@ struct NodeInfo {
     struct HostMemoryBackend *node_memdev;
     bool present;
     bool has_cpu;
+    bool has_gi;
     uint8_t lb_info_provided;
     uint16_t initiator;
     uint8_t distance[MAX_NODES];
diff --git a/qapi/qom.json b/qapi/qom.json
index c53ef978ff..81727310b1 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -794,6 +794,21 @@
 { 'struct': 'VfioUserServerProperties',
   'data': { 'socket': 'SocketAddress', 'device': 'str' } }
 
+##
+# @AcpiGenericInitiatorProperties:
+#
+# Properties for acpi-generic-initiator objects.
+#
+# @pci-dev: PCI device ID to be associated with the node
+#
+# @node: NUMA node associated with the PCI device
+#
+# Since: 9.0
+##
+{ 'struct': 'AcpiGenericInitiatorProperties',
+  'data': { 'pci-dev': 'str',
+            'node': 'uint32' } }
+
 ##
 # @RngProperties:
 #
@@ -911,6 +926,7 @@
 ##
 { 'enum': 'ObjectType',
   'data': [
+    'acpi-generic-initiator',
     'authz-list',
     'authz-listfile',
     'authz-pam',
@@ -981,6 +997,7 @@
             'id': 'str' },
   'discriminator': 'qom-type',
   'data': {
+      'acpi-generic-initiator':     'AcpiGenericInitiatorProperties',
       'authz-list':                 'AuthZListProperties',
       'authz-listfile':             'AuthZListFileProperties',
       'authz-pam':                  'AuthZPAMProperties',
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2024-03-06 12:33 [PATCH v8 0/2] acpi: report NUMA nodes for device memory using GI ankita
  2024-03-06 12:33 ` [PATCH v8 1/2] qom: new object to associate device to NUMA node ankita
@ 2024-03-06 12:33 ` ankita
  2024-03-06 13:58   ` Jonathan Cameron via
  1 sibling, 1 reply; 8+ messages in thread
From: ankita @ 2024-03-06 12:33 UTC (permalink / raw)
  To: ankita, jgg, marcel.apfelbaum, philmd, wangyanan55,
	alex.williamson, pbonzini, clg, shannon.zhaosl, peter.maydell,
	ani, berrange, eduardo, imammedo, mst, eblake, armbru, david,
	gshan, Jonathan.Cameron
  Cc: aniketa, cjia, kwankhede, targupta, vsethi, acurrid, dnigam,
	udhoke, qemu-arm, qemu-devel

From: Ankit Agrawal <ankita@nvidia.com>

ACPI spec provides a scheme to associate "Generic Initiators" [1]
(e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines GPUs) with Proximity Domains. This is
achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
node for each unique PXM ID encountered. Qemu currently do not implement
these structures while building SRAT.

Add GI structures while building VM ACPI SRAT. The association between
device and node are stored using acpi-generic-initiator object. Lookup
presence of all such objects and use them to build these structures.

The structure needs a PCI device handle [2] that consists of the device BDF.
The vfio-pci device corresponding to the acpi-generic-initiator object is
located to determine the BDF.

[1] ACPI Spec 6.3, Section 5.2.16.6
[2] ACPI Spec 6.3, Table 5.80

Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cedric Le Goater <clg@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
---
 hw/acpi/acpi_generic_initiator.c         | 73 ++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c                 |  3 +
 include/hw/acpi/acpi_generic_initiator.h | 25 ++++++++
 3 files changed, 101 insertions(+)

diff --git a/hw/acpi/acpi_generic_initiator.c b/hw/acpi/acpi_generic_initiator.c
index f57b3c8984..5a7d15363e 100644
--- a/hw/acpi/acpi_generic_initiator.c
+++ b/hw/acpi/acpi_generic_initiator.c
@@ -71,3 +71,76 @@ static void acpi_generic_initiator_class_init(ObjectClass *oc, void *data)
     object_class_property_add(oc, "node", "int", NULL,
         acpi_generic_initiator_set_node, NULL, NULL);
 }
+
+/*
+ * ACPI 6.3:
+ * Table 5-78 Generic Initiator Affinity Structure
+ */
+static void
+build_srat_generic_pci_initiator_affinity(GArray *table_data, int node,
+                                          PCIDeviceHandle *handle)
+{
+    uint8_t index;
+
+    build_append_int_noprefix(table_data, 5, 1);  /* Type */
+    build_append_int_noprefix(table_data, 32, 1); /* Length */
+    build_append_int_noprefix(table_data, 0, 1);  /* Reserved */
+    build_append_int_noprefix(table_data, 1, 1);  /* Device Handle Type: PCI */
+    build_append_int_noprefix(table_data, node, 4);  /* Proximity Domain */
+
+    /* Device Handle - PCI */
+    build_append_int_noprefix(table_data, handle->segment, 2);
+    build_append_int_noprefix(table_data, handle->bdf, 2);
+    for (index = 0; index < 12; index++) {
+        build_append_int_noprefix(table_data, 0, 1);
+    }
+
+    build_append_int_noprefix(table_data, GEN_AFFINITY_ENABLED, 4); /* Flags */
+    build_append_int_noprefix(table_data, 0, 4);     /* Reserved */
+}
+
+static int build_all_acpi_generic_initiators(Object *obj, void *opaque)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    AcpiGenericInitiator *gi;
+    GArray *table_data = opaque;
+    PCIDeviceHandle dev_handle;
+    PCIDevice *pci_dev;
+    Object *o;
+
+    if (!object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
+        return 0;
+    }
+
+    gi = ACPI_GENERIC_INITIATOR(obj);
+    if (gi->node >= ms->numa_state->num_nodes) {
+        error_printf("%s: Specified node %d is invalid.\n",
+                     TYPE_ACPI_GENERIC_INITIATOR, gi->node);
+        exit(1);
+    }
+
+    o = object_resolve_path_type(gi->pci_dev, TYPE_PCI_DEVICE, NULL);
+    if (!o) {
+        error_printf("%s: Specified device must be a PCI device.\n",
+                     TYPE_ACPI_GENERIC_INITIATOR);
+        exit(1);
+    }
+
+    pci_dev = PCI_DEVICE(o);
+
+    dev_handle.segment = 0;
+    dev_handle.bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                               pci_dev->devfn);
+
+    build_srat_generic_pci_initiator_affinity(table_data,
+                                              gi->node, &dev_handle);
+
+    return 0;
+}
+
+void build_srat_generic_pci_initiator(GArray *table_data)
+{
+    object_child_foreach_recursive(object_get_root(),
+                                   build_all_acpi_generic_initiators,
+                                   table_data);
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 8bc35a483c..a2b3a2eb25 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -58,6 +58,7 @@
 #include "migration/vmstate.h"
 #include "hw/acpi/ghes.h"
 #include "hw/acpi/viot.h"
+#include "hw/acpi/acpi_generic_initiator.h"
 
 #define ARM_SPI_BASE 32
 
@@ -558,6 +559,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         }
     }
 
+    build_srat_generic_pci_initiator(table_data);
+
     if (ms->nvdimms_state->is_enabled) {
         nvdimm_build_srat(table_data);
     }
diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
index 23d0b591c6..4bebe119ee 100644
--- a/include/hw/acpi/acpi_generic_initiator.h
+++ b/include/hw/acpi/acpi_generic_initiator.h
@@ -29,4 +29,29 @@ typedef struct AcpiGenericInitiatorClass {
     ObjectClass parent_class;
 } AcpiGenericInitiatorClass;
 
+/*
+ * ACPI 6.3:
+ * Table 5-81 Flags – Generic Initiator Affinity Structure
+ */
+typedef enum {
+    /*
+     * If clear, the OSPM ignores the contents of the Generic
+     * Initiator/Port Affinity Structure. This allows system firmware
+     * to populate the SRAT with a static number of structures, but only
+     * enable them as necessary.
+     */
+    GEN_AFFINITY_ENABLED = (1 << 0),
+} GenericAffinityFlags;
+
+/*
+ * ACPI 6.3:
+ * Table 5-80 Device Handle - PCI
+ */
+typedef struct PCIDeviceHandle {
+    uint16_t segment;
+    uint16_t bdf;
+} PCIDeviceHandle;
+
+void build_srat_generic_pci_initiator(GArray *table_data);
+
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 1/2] qom: new object to associate device to NUMA node
  2024-03-06 12:33 ` [PATCH v8 1/2] qom: new object to associate device to NUMA node ankita
@ 2024-03-06 13:49   ` Jonathan Cameron via
  2024-03-07  2:56     ` Ankit Agrawal
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 13:49 UTC (permalink / raw)
  To: ankita
  Cc: jgg, marcel.apfelbaum, philmd, wangyanan55, alex.williamson,
	pbonzini, clg, shannon.zhaosl, peter.maydell, ani, berrange,
	eduardo, imammedo, mst, eblake, armbru, david, gshan, aniketa,
	cjia, kwankhede, targupta, vsethi, acurrid, dnigam, udhoke,
	qemu-arm, qemu-devel

On Wed, 6 Mar 2024 12:33:16 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
> partitioning of the GPU device resources (including device memory) into
> several (upto 8) isolated instances. Each of the partitioned memory needs
> a dedicated NUMA node to operate. The partitions are not fixed and they
> can be created/deleted at runtime.
> 
> Unfortunately Linux OS does not provide a means to dynamically create/destroy
> NUMA nodes and such feature implementation is not expected to be trivial. The
> nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
> we utilize the Generic Initiator Affinity structures that allows association
> between nodes and devices. Multiple GI structures per BDF is possible,
> allowing creation of multiple nodes by exposing unique PXM in each of these
> structures.
> 
> Implement the mechanism to build the GI affinity structures as Qemu currently
> does not. Introduce a new acpi-generic-initiator object to allow host admin
> link a device with an associated NUMA node. Qemu maintains this association
> and use this object to build the requisite GI Affinity Structure.
> 
> When multiple NUMA nodes are associated with a device, it is required to
> create those many number of acpi-generic-initiator objects, each representing
> a unique device:node association.
> 
> Following is one of a decoded GI affinity structure in VM ACPI SRAT.
> [0C8h 0200   1]                Subtable Type : 05 [Generic Initiator Affinity]
> [0C9h 0201   1]                       Length : 20
> 
> [0CAh 0202   1]                    Reserved1 : 00
> [0CBh 0203   1]           Device Handle Type : 01
> [0CCh 0204   4]             Proximity Domain : 00000007
> [0D0h 0208  16]                Device Handle : 00 00 20 00 00 00 00 00 00 00 00
> 00 00 00 00 00
> [0E0h 0224   4]        Flags (decoded below) : 00000001
>                                      Enabled : 1
> [0E4h 0228   4]                    Reserved2 : 00000000
> 
> [0E8h 0232   1]                Subtable Type : 05 [Generic Initiator Affinity]
> [0E9h 0233   1]                       Length : 20
> 
> An admin can provide a range of acpi-generic-initiator objects, each
> associating a device (by providing the id through pci-dev argument)
> to the desired NUMA node (using the node argument). Currently, only PCI
> device is supported.
> 
> For the grace hopper system, create a range of 8 nodes and associate that
> with the device using the acpi-generic-initiator object. While a configuration
> of less than 8 nodes per device is allowed, such configuration will prevent
> utilization of the feature to the fullest. The following sample creates 8
> nodes per PCI device for a VM with 2 PCI devices and link them to the
> respecitve PCI device using acpi-generic-initiator objects:
> 
> -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
> -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
> -numa node,nodeid=8 -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
> -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
> -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
> -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
> -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
> -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
> -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
> -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
> 
> -numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
> -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
> -numa node,nodeid=16 -numa node,nodeid=17 \
> -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
> -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
> -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
> -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
> -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
> -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
> -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
> -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
> -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \
> 
> The performance benefits can be realized by providing the NUMA node distances
> appropriately (through libvirt tags or Qemu params). The admin can get the
> distance among nodes in hardware using `numactl -H`.

That's a lot of description when you could just have claimed you want a normal
GI node for HMAT and we'd have all believed you ;)

> 
> Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Markus Armbruster <armbru@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

Hi Ankit,

Some minor things inline. With the includes tidied up.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
> new file mode 100644
> index 0000000000..23d0b591c6
> --- /dev/null
> +++ b/include/hw/acpi/acpi_generic_initiator.h
> @@ -0,0 +1,32 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#ifndef ACPI_GENERIC_INITIATOR_H
> +#define ACPI_GENERIC_INITIATOR_H
> +
> +#include "hw/mem/pc-dimm.h"

Why?

> +#include "hw/acpi/bios-linker-loader.h"
> +#include "hw/acpi/aml-build.h"
> +#include "sysemu/numa.h"

This should only include headers that it uses directly.
If they are needed down in the c files, then include them there.

Most of these are not related to what you have here.

> +#include "qemu/uuid.h"
> +#include "qom/object.h"
> +#include "qom/object_interfaces.h"
> +
> +#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
> +
> +typedef struct AcpiGenericInitiator {
> +    /* private */
> +    Object parent;
> +
> +    /* public */
> +    char *pci_dev;
> +    uint16_t node;
> +} AcpiGenericInitiator;
> +
> +typedef struct AcpiGenericInitiatorClass {
> +    ObjectClass parent_class;
> +} AcpiGenericInitiatorClass;

Trivial, but you could push the class definition down into the c file
given it's not accessed from anywhere else.

> +
> +#endif



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2024-03-06 12:33 ` [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
@ 2024-03-06 13:58   ` Jonathan Cameron via
  2024-03-07  3:03     ` Ankit Agrawal
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Cameron via @ 2024-03-06 13:58 UTC (permalink / raw)
  To: ankita
  Cc: jgg, marcel.apfelbaum, philmd, wangyanan55, alex.williamson,
	pbonzini, clg, shannon.zhaosl, peter.maydell, ani, berrange,
	eduardo, imammedo, mst, eblake, armbru, david, gshan, aniketa,
	cjia, kwankhede, targupta, vsethi, acurrid, dnigam, udhoke,
	qemu-arm, qemu-devel

On Wed, 6 Mar 2024 12:33:17 +0000
<ankita@nvidia.com> wrote:

> From: Ankit Agrawal <ankita@nvidia.com>
> 
> ACPI spec provides a scheme to associate "Generic Initiators" [1]
> (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines GPUs) with Proximity Domains. This is
> achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
> Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
> node for each unique PXM ID encountered. Qemu currently do not implement
> these structures while building SRAT.
> 
> Add GI structures while building VM ACPI SRAT. The association between
> device and node are stored using acpi-generic-initiator object. Lookup
> presence of all such objects and use them to build these structures.
> 
> The structure needs a PCI device handle [2] that consists of the device BDF.
> The vfio-pci device corresponding to the acpi-generic-initiator object is
> located to determine the BDF.
> 
> [1] ACPI Spec 6.3, Section 5.2.16.6
> [2] ACPI Spec 6.3, Table 5.80
> 
> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Cedric Le Goater <clg@redhat.com>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>

I guess we gloss over the bisection breakage due to being able to add
these nodes and have them used in HMAT as initiators before we have
added SRAT support.  Linux will moan about it and not use such an HMAT
but meh, it will boot.

You could drag the HMAT change after this but perhaps it's not worth bothering.

Otherwise LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Could add x86 support (posted in reply to v7 this morning)
and sounds like you have the test nearly ready which is great.

Jonathan






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 1/2] qom: new object to associate device to NUMA node
  2024-03-06 13:49   ` Jonathan Cameron via
@ 2024-03-07  2:56     ` Ankit Agrawal
  0 siblings, 0 replies; 8+ messages in thread
From: Ankit Agrawal @ 2024-03-07  2:56 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Jason Gunthorpe, marcel.apfelbaum@gmail.com, philmd@linaro.org,
	wangyanan55@huawei.com, alex.williamson@redhat.com,
	pbonzini@redhat.com, clg@redhat.com, shannon.zhaosl@gmail.com,
	peter.maydell@linaro.org, ani@anisinha.ca, berrange@redhat.com,
	eduardo@habkost.net, imammedo@redhat.com, mst@redhat.com,
	eblake@redhat.com, armbru@redhat.com, david@redhat.com,
	gshan@redhat.com, Aniket Agashe, Neo Jia, Kirti Wankhede,
	Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid, Dheeraj Nigam,
	Uday Dhoke, qemu-arm@nongnu.org, qemu-devel@nongnu.org

>> -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
>> -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \
>> 
>> The performance benefits can be realized by providing the NUMA node distances
>> appropriately (through libvirt tags or Qemu params). The admin can get the
>> distance among nodes in hardware using `numactl -H`.
>
> That's a lot of description when you could just have claimed you want a normal
> GI node for HMAT and we'd have all believed you ;)

Ack, I'll remove this part and change it to say as such.

>> 
>> Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
>> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Markus Armbruster <armbru@redhat.com>
>> Acked-by: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
>
> Hi Ankit,
>
> Some minor things inline. With the includes tidied up.
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks!

>> diff --git a/include/hw/acpi/acpi_generic_initiator.h b/include/hw/acpi/acpi_generic_initiator.h
>> new file mode 100644
>> index 0000000000..23d0b591c6
>> --- /dev/null
>> +++ b/include/hw/acpi/acpi_generic_initiator.h
>> @@ -0,0 +1,32 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
>> + */
>> +
>> +#ifndef ACPI_GENERIC_INITIATOR_H
>> +#define ACPI_GENERIC_INITIATOR_H
>> +
>> +#include "hw/mem/pc-dimm.h"
>
> Why?
>
>> +#include "hw/acpi/bios-linker-loader.h"
>> +#include "hw/acpi/aml-build.h"
>> +#include "sysemu/numa.h"
>
> This should only include headers that it uses directly.
> If they are needed down in the c files, then include them there.

Ack, will fix this in the next version.

>> +typedef struct AcpiGenericInitiator {
>> +    /* private */
>> +    Object parent;
>> +
>> +    /* public */
>> +    char *pci_dev;
>> +    uint16_t node;
>> +} AcpiGenericInitiator;
>> +
>> +typedef struct AcpiGenericInitiatorClass {
>> +    ObjectClass parent_class;
>> +} AcpiGenericInitiatorClass;
>
> Trivial, but you could push the class definition down into the c file
> given it's not accessed from anywhere else.

Sure will move the AcpiGenericInitiatorClass typedef to the .c file.

> +
> +#endif


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2024-03-06 13:58   ` Jonathan Cameron via
@ 2024-03-07  3:03     ` Ankit Agrawal
  2024-03-07  9:35       ` Jonathan Cameron via
  0 siblings, 1 reply; 8+ messages in thread
From: Ankit Agrawal @ 2024-03-07  3:03 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Jason Gunthorpe, marcel.apfelbaum@gmail.com, philmd@linaro.org,
	wangyanan55@huawei.com, alex.williamson@redhat.com,
	pbonzini@redhat.com, clg@redhat.com, shannon.zhaosl@gmail.com,
	peter.maydell@linaro.org, ani@anisinha.ca, berrange@redhat.com,
	eduardo@habkost.net, imammedo@redhat.com, mst@redhat.com,
	eblake@redhat.com, armbru@redhat.com, david@redhat.com,
	gshan@redhat.com, Aniket Agashe, Neo Jia, Kirti Wankhede,
	Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid, Dheeraj Nigam,
	Uday Dhoke, qemu-arm@nongnu.org, qemu-devel@nongnu.org

>>
>> [1] ACPI Spec 6.3, Section 5.2.16.6
>> [2] ACPI Spec 6.3, Table 5.80
>>
>> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Cedric Le Goater <clg@redhat.com>
>> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
>
> I guess we gloss over the bisection breakage due to being able to add
> these nodes and have them used in HMAT as initiators before we have
> added SRAT support.  Linux will moan about it and not use such an HMAT
> but meh, it will boot.
>
> You could drag the HMAT change after this but perhaps it's not worth bothering.

Sorry this part isn't clear to me. Are you suggesting we keep the HMAT
changes out from this patch?

> Otherwise LGTM
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Thanks!

> Could add x86 support (posted in reply to v7 this morning)
> and sounds like you have the test nearly ready which is great.

Ok, will add the x86 part as well. I could reuse what you shared
earlier.

https://gitlab.com/jic23/qemu/-/commit/ccfb4fe22167e035173390cf147d9c226951b9b6





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure
  2024-03-07  3:03     ` Ankit Agrawal
@ 2024-03-07  9:35       ` Jonathan Cameron via
  0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron via @ 2024-03-07  9:35 UTC (permalink / raw)
  To: Ankit Agrawal
  Cc: Jason Gunthorpe, marcel.apfelbaum@gmail.com, philmd@linaro.org,
	wangyanan55@huawei.com, alex.williamson@redhat.com,
	pbonzini@redhat.com, clg@redhat.com, shannon.zhaosl@gmail.com,
	peter.maydell@linaro.org, ani@anisinha.ca, berrange@redhat.com,
	eduardo@habkost.net, imammedo@redhat.com, mst@redhat.com,
	eblake@redhat.com, armbru@redhat.com, david@redhat.com,
	gshan@redhat.com, Aniket Agashe, Neo Jia, Kirti Wankhede,
	Tarun Gupta (SW-GPU), Vikram Sethi, Andy Currid, Dheeraj Nigam,
	Uday Dhoke, qemu-arm@nongnu.org, qemu-devel@nongnu.org

On Thu, 7 Mar 2024 03:03:02 +0000
Ankit Agrawal <ankita@nvidia.com> wrote:

> >>
> >> [1] ACPI Spec 6.3, Section 5.2.16.6
> >> [2] ACPI Spec 6.3, Table 5.80
> >>
> >> Cc: Jonathan Cameron <qemu-devel@nongnu.org>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Cc: Cedric Le Goater <clg@redhat.com>
> >> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>  
> >
> > I guess we gloss over the bisection breakage due to being able to add
> > these nodes and have them used in HMAT as initiators before we have
> > added SRAT support.  Linux will moan about it and not use such an HMAT
> > but meh, it will boot.
> >
> > You could drag the HMAT change after this but perhaps it's not worth bothering.  
> 
> Sorry this part isn't clear to me. Are you suggesting we keep the HMAT
> changes out from this patch?

No - don't drop them. Move them from patch 1 to either patch 2, or to a
patch 3 if that ends up looking clearer.  I think patch 2 is the
right choice though as that enables everything at once.

It's valid to have SRAT containing GI entries without the same in HMAT
(as HMAT doesn't have to be complete), it's not valid to have HMAT refer
to entries that aren't in SRAT.

Another thing we may need to do add in the long run is the _OSC support.
That's needed for DSDT entries with _PXM associated with a GI only node
so that we can make them move node depending on whether or not the
Guest OS supports GIs and so will create the nodes.  Requires a bit of
magic AML to make that work.

It used to crash linux if you didn't do that, but that's been fixed
for a while I believe.

For now we aren't adding any such _PXM entries though so this is just
one for the TODO list :)


> 
> > Otherwise LGTM
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>  
> 
> Thanks!
> 
> > Could add x86 support (posted in reply to v7 this morning)
> > and sounds like you have the test nearly ready which is great.  
> 
> Ok, will add the x86 part as well. I could reuse what you shared
> earlier.
> 
> https://gitlab.com/jic23/qemu/-/commit/ccfb4fe22167e035173390cf147d9c226951b9b6
Excellent - thanks!

Jonathan

> 
> 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-07  9:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-06 12:33 [PATCH v8 0/2] acpi: report NUMA nodes for device memory using GI ankita
2024-03-06 12:33 ` [PATCH v8 1/2] qom: new object to associate device to NUMA node ankita
2024-03-06 13:49   ` Jonathan Cameron via
2024-03-07  2:56     ` Ankit Agrawal
2024-03-06 12:33 ` [PATCH v8 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2024-03-06 13:58   ` Jonathan Cameron via
2024-03-07  3:03     ` Ankit Agrawal
2024-03-07  9:35       ` Jonathan Cameron via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).