From: Jonathan Cameron via <qemu-arm@nongnu.org>
To: <ankita@nvidia.com>
Cc: <jgg@nvidia.com>, <alex.williamson@redhat.com>, <clg@redhat.com>,
<shannon.zhaosl@gmail.com>, <peter.maydell@linaro.org>,
<ani@anisinha.ca>, <berrange@redhat.com>, <eduardo@habkost.net>,
<imammedo@redhat.com>, <mst@redhat.com>, <eblake@redhat.com>,
<armbru@redhat.com>, <david@redhat.com>, <gshan@redhat.com>,
<aniketa@nvidia.com>, <cjia@nvidia.com>, <kwankhede@nvidia.com>,
<targupta@nvidia.com>, <vsethi@nvidia.com>, <acurrid@nvidia.com>,
<dnigam@nvidia.com>, <udhoke@nvidia.com>, <qemu-arm@nongnu.org>,
<qemu-devel@nongnu.org>
Subject: Re: [PATCH v6 0/2] acpi: report numa nodes for device memory using GI
Date: Tue, 2 Jan 2024 12:31:43 +0000 [thread overview]
Message-ID: <20240102123143.00006486@Huawei.com> (raw)
In-Reply-To: <20231225045603.7654-1-ankita@nvidia.com>
On Mon, 25 Dec 2023 10:26:01 +0530
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> There are upcoming devices which allow CPU to cache coherently access
> their memory. It is sensible to expose such memory as NUMA nodes separate
> from the sysmem node to the OS. The ACPI spec provides a scheme in SRAT
> called Generic Initiator Affinity Structure [1] to allow an association
> between a Proximity Domain (PXM) and a Generic Initiator (GI) (e.g.
> heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines).
>
> While a single node per device may cover several use cases, it is however
> insufficient for a full utilization of the NVIDIA GPUs MIG
> (Mult-Instance GPUs) [2] feature. The feature allows partitioning of the
> GPU device resources (including device memory) into several (upto 8)
> isolated instances. Each of the partitioned memory requires a dedicated NUMA
> node to operate. The partitions are not fixed and they can be created/deleted
> at runtime.
>
> Linux OS does not provide a means to dynamically create/destroy NUMA nodes
> and such feature implementation is expected to be non-trivial. The nodes
> that OS discovers at the boot time while parsing SRAT remains fixed. So we
> utilize the GI Affinity structures that allows association between nodes
> and devices. Multiple GI structures per device/BDF is possible, allowing
> creation of multiple nodes in the VM by exposing unique PXM in each of these
> structures.
>
> Implement the mechanism to build the GI affinity structures as Qemu currently
> does not. Introduce a new acpi-generic-initiator object that allows an
> association of a set of nodes with a device. During SRAT creation, all such
> objected are identified and used to add the GI Affinity Structures. Currently,
> only PCI device is supported. On a multi device system, each device supporting
> the features needs a unique acpi-generic-initiator object with its own set of
> NUMA nodes associated to it.
>
> The admin will create a range of 8 nodes and associate that with the device
> using the acpi-generic-initiator object. While a configuration of less than
> 8 nodes per device is allowed, such configuration will prevent utilization of
> the feature to the fullest. This setting is applicable to all the Grace+Hopper
> systems. The following is an example of the Qemu command line arguments to
> create 8 nodes and link them to the device 'dev0':
>
> -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
> -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
> -numa node,nodeid=8 -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
>
I'd find it helpful to see the resulting chunk of SRAT for these examples
(disassembled) in this cover letter and the patches (where there are more examples).
WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: <ankita@nvidia.com>
Cc: <jgg@nvidia.com>, <alex.williamson@redhat.com>, <clg@redhat.com>,
<shannon.zhaosl@gmail.com>, <peter.maydell@linaro.org>,
<ani@anisinha.ca>, <berrange@redhat.com>, <eduardo@habkost.net>,
<imammedo@redhat.com>, <mst@redhat.com>, <eblake@redhat.com>,
<armbru@redhat.com>, <david@redhat.com>, <gshan@redhat.com>,
<aniketa@nvidia.com>, <cjia@nvidia.com>, <kwankhede@nvidia.com>,
<targupta@nvidia.com>, <vsethi@nvidia.com>, <acurrid@nvidia.com>,
<dnigam@nvidia.com>, <udhoke@nvidia.com>, <qemu-arm@nongnu.org>,
<qemu-devel@nongnu.org>
Subject: Re: [PATCH v6 0/2] acpi: report numa nodes for device memory using GI
Date: Tue, 2 Jan 2024 12:31:43 +0000 [thread overview]
Message-ID: <20240102123143.00006486@Huawei.com> (raw)
In-Reply-To: <20231225045603.7654-1-ankita@nvidia.com>
On Mon, 25 Dec 2023 10:26:01 +0530
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> There are upcoming devices which allow CPU to cache coherently access
> their memory. It is sensible to expose such memory as NUMA nodes separate
> from the sysmem node to the OS. The ACPI spec provides a scheme in SRAT
> called Generic Initiator Affinity Structure [1] to allow an association
> between a Proximity Domain (PXM) and a Generic Initiator (GI) (e.g.
> heterogeneous processors and accelerators, GPUs, and I/O devices with
> integrated compute or DMA engines).
>
> While a single node per device may cover several use cases, it is however
> insufficient for a full utilization of the NVIDIA GPUs MIG
> (Mult-Instance GPUs) [2] feature. The feature allows partitioning of the
> GPU device resources (including device memory) into several (upto 8)
> isolated instances. Each of the partitioned memory requires a dedicated NUMA
> node to operate. The partitions are not fixed and they can be created/deleted
> at runtime.
>
> Linux OS does not provide a means to dynamically create/destroy NUMA nodes
> and such feature implementation is expected to be non-trivial. The nodes
> that OS discovers at the boot time while parsing SRAT remains fixed. So we
> utilize the GI Affinity structures that allows association between nodes
> and devices. Multiple GI structures per device/BDF is possible, allowing
> creation of multiple nodes in the VM by exposing unique PXM in each of these
> structures.
>
> Implement the mechanism to build the GI affinity structures as Qemu currently
> does not. Introduce a new acpi-generic-initiator object that allows an
> association of a set of nodes with a device. During SRAT creation, all such
> objected are identified and used to add the GI Affinity Structures. Currently,
> only PCI device is supported. On a multi device system, each device supporting
> the features needs a unique acpi-generic-initiator object with its own set of
> NUMA nodes associated to it.
>
> The admin will create a range of 8 nodes and associate that with the device
> using the acpi-generic-initiator object. While a configuration of less than
> 8 nodes per device is allowed, such configuration will prevent utilization of
> the feature to the fullest. This setting is applicable to all the Grace+Hopper
> systems. The following is an example of the Qemu command line arguments to
> create 8 nodes and link them to the device 'dev0':
>
> -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
> -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
> -numa node,nodeid=8 -numa node,nodeid=9 \
> -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
>
I'd find it helpful to see the resulting chunk of SRAT for these examples
(disassembled) in this cover letter and the patches (where there are more examples).
next prev parent reply other threads:[~2024-01-02 12:32 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-25 4:56 [PATCH v6 0/2] acpi: report numa nodes for device memory using GI ankita
2023-12-25 4:56 ` [PATCH v6 1/2] qom: new object to associate device to numa node ankita
2024-01-02 12:58 ` Jonathan Cameron via
2024-01-02 12:58 ` Jonathan Cameron via
2024-01-04 3:36 ` Ankit Agrawal
2024-01-04 12:33 ` Ankit Agrawal
2024-01-04 16:40 ` Ankit Agrawal
2024-01-04 17:39 ` Alex Williamson
2024-01-09 16:52 ` Jonathan Cameron via
2024-01-09 17:02 ` David Hildenbrand
2024-01-09 17:10 ` Jason Gunthorpe
2024-01-09 19:36 ` Dan Williams
2024-01-09 19:38 ` Jason Gunthorpe
2024-01-10 23:19 ` Dan Williams
2024-01-11 7:01 ` Michael S. Tsirkin
2024-01-16 14:02 ` Ankit Agrawal
2024-01-04 17:23 ` Alex Williamson
2024-01-09 4:21 ` Ankit Agrawal
2024-01-09 16:38 ` Jonathan Cameron via
2024-01-09 16:38 ` Jonathan Cameron via
2024-01-08 12:09 ` Markus Armbruster
2024-01-09 4:11 ` Ankit Agrawal
2024-01-09 7:02 ` Markus Armbruster
2023-12-25 4:56 ` [PATCH v6 2/2] hw/acpi: Implement the SRAT GI affinity structure ankita
2024-01-02 12:31 ` Jonathan Cameron via [this message]
2024-01-02 12:31 ` [PATCH v6 0/2] acpi: report numa nodes for device memory using GI Jonathan Cameron via
2024-01-04 3:05 ` Ankit Agrawal
2024-02-12 16:05 ` Michael S. Tsirkin
2024-02-13 3:32 ` Ankit Agrawal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240102123143.00006486@Huawei.com \
--to=qemu-arm@nongnu.org \
--cc=Jonathan.Cameron@Huawei.com \
--cc=acurrid@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=ani@anisinha.ca \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=cjia@nvidia.com \
--cc=clg@redhat.com \
--cc=david@redhat.com \
--cc=dnigam@nvidia.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=jgg@nvidia.com \
--cc=kwankhede@nvidia.com \
--cc=mst@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=shannon.zhaosl@gmail.com \
--cc=targupta@nvidia.com \
--cc=udhoke@nvidia.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.