From: Alex Williamson <alex.williamson@redhat.com>
To: <ankita@nvidia.com>
Cc: <jgg@nvidia.com>, <qemu-devel@nongnu.org>
Subject: Re: [RFC v1 1/4] qemu: add GPU memory information as object
Date: Tue, 6 Jun 2023 09:19:01 -0600 [thread overview]
Message-ID: <20230606091901.22bc6deb.alex.williamson@redhat.com> (raw)
In-Reply-To: <20230605235005.20649-2-ankita@nvidia.com>
On Mon, 5 Jun 2023 16:50:02 -0700
<ankita@nvidia.com> wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> The GPU memory is exposed as device BAR1 to the VM and is discovered
> by QEMU through the VFIO_DEVICE_GET_REGION_INFO ioctl. QEMU performs
> the mapping to it.
>
> The GPU memory can be added in the VM as (upto 8) separate NUMA nodes.
> To achieve this, QEMU inserts a series of the PXM domains in the SRAT
> and communicate this range of nodes to the VM through DSD properties.
>
> These PXM start and count are added as object properties and pushed to
> the SRAT and DST builder code.
>
> The code is activated only for a set of NVIDIA devices supporting the
> feature.
>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> hw/vfio/pci-quirks.c | 13 +++++++
> hw/vfio/pci.c | 72 +++++++++++++++++++++++++++++++++++++
> hw/vfio/pci.h | 1 +
> include/hw/pci/pci_device.h | 3 ++
> 4 files changed, 89 insertions(+)
>
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index f0147a050a..b7334ccd1d 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1751,3 +1751,16 @@ int vfio_add_virt_caps(VFIOPCIDevice *vdev, Error **errp)
>
> return 0;
> }
> +
> +bool vfio_has_cpu_coherent_devmem(VFIOPCIDevice *vdev)
> +{
> + switch (vdev->device_id) {
> + /* Nvidia */
> + case 0x2342:
> + case 0x2343:
> + case 0x2345:
> + return true;
> + }
> +
> + return false;
> +}
I'm not sure why all of this isn't in pci-quirks.c, but the above
function is misleadingly NVIDIA specific by not testing the vendor ID
here.
Also, none of this looks compatible with hotplug, so shouldn't any of
this only be enabled only for the vfio-pci-nohotplug device type?
Thanks,
Alex
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ec9a854361..403516ffb3 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -42,6 +42,8 @@
> #include "qapi/error.h"
> #include "migration/blocker.h"
> #include "migration/qemu-file.h"
> +#include "qapi/visitor.h"
> +#include "include/hw/boards.h"
>
> #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
>
> @@ -2824,6 +2826,22 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
> }
> }
>
> +static void vfio_pci_get_gpu_mem_pxm_start(Object *obj, Visitor *v,
> + const char *name,
> + void *opaque, Error **errp)
> +{
> + uint64_t pxm_start = (uintptr_t) opaque;
> + visit_type_uint64(v, name, &pxm_start, errp);
> +}
> +
> +static void vfio_pci_get_gpu_mem_pxm_count(Object *obj, Visitor *v,
> + const char *name,
> + void *opaque, Error **errp)
> +{
> + uint64_t pxm_count = (uintptr_t) opaque;
> + visit_type_uint64(v, name, &pxm_count, errp);
> +}
> +
> static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> {
> Error *err = NULL;
> @@ -2843,6 +2861,53 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> vdev->req_enabled = false;
> }
>
> +static int vfio_pci_nvidia_dev_mem_probe(VFIOPCIDevice *vPciDev,
> + Error **errp)
> +{
> + unsigned int num_nodes;
> + MemoryRegion *nv2mr = g_malloc0(sizeof(*nv2mr));
> + Object *obj = NULL;
> + VFIODevice *vdev = &vPciDev->vbasedev;
> + MachineState *ms = MACHINE(qdev_get_machine());
> +
> + if (!vfio_has_cpu_coherent_devmem(vPciDev)) {
> + return -ENODEV;
> + }
> +
> + if (vdev->type == VFIO_DEVICE_TYPE_PCI) {
> + obj = vfio_pci_get_object(vdev);
> + }
> +
> + if (!obj) {
> + return -EINVAL;
> + }
> +
> + /*
> + * This device has memory that is coherently accessible from the CPU.
> + * The memory can be represented by upto 8 seperate memory-only
> + * NUMA nodes.
> + */
> + vPciDev->pdev.has_coherent_memory = true;
> + num_nodes = 8;
> +
> + /*
> + * To have 8 unique nodes in the VM, a series of PXM nodes are
> + * required to be added to VM's SRAT. Send the information about
> + * the starting PXM ID and the count to the ACPI builder code.
> + */
> + object_property_add(OBJECT(vPciDev), "gpu_mem_pxm_start", "uint64",
> + vfio_pci_get_gpu_mem_pxm_start, NULL, NULL,
> + (void *) (uintptr_t) ms->numa_state->num_nodes);
> +
> + object_property_add(OBJECT(vPciDev), "gpu_mem_pxm_count", "uint64",
> + vfio_pci_get_gpu_mem_pxm_count, NULL, NULL,
> + (void *) (uintptr_t) num_nodes);
> +
> + ms->numa_state->num_nodes += num_nodes;
> +
> + return 0;
> +}
> +
> static void vfio_realize(PCIDevice *pdev, Error **errp)
> {
> VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> @@ -3151,6 +3216,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> }
> }
>
> + if (vdev->vendor_id == PCI_VENDOR_ID_NVIDIA) {
> + ret = vfio_pci_nvidia_dev_mem_probe(vdev, errp);
> + if (ret && ret != -ENODEV) {
> + error_report("Failed to setup NVIDIA dev_mem with error %d", ret);
> + }
> + }
> +
> vfio_register_err_notifier(vdev);
> vfio_register_req_notifier(vdev);
> vfio_setup_resetfn_quirk(vdev);
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 177abcc8fb..d8791f8f1f 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -226,4 +226,5 @@ void vfio_display_reset(VFIOPCIDevice *vdev);
> int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp);
> void vfio_display_finalize(VFIOPCIDevice *vdev);
>
> +bool vfio_has_cpu_coherent_devmem(VFIOPCIDevice *vdev);
> #endif /* HW_VFIO_VFIO_PCI_H */
> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
> index d3dd0f64b2..aacd2279ae 100644
> --- a/include/hw/pci/pci_device.h
> +++ b/include/hw/pci/pci_device.h
> @@ -157,6 +157,9 @@ struct PCIDevice {
> MSIVectorReleaseNotifier msix_vector_release_notifier;
> MSIVectorPollNotifier msix_vector_poll_notifier;
>
> + /* GPU coherent memory */
> + bool has_coherent_memory;
> +
> /* ID of standby device in net_failover pair */
> char *failover_pair_id;
> uint32_t acpi_index;
next prev parent reply other threads:[~2023-06-06 15:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-05 23:50 [RFC v1 0/4] Expose GPU memory as coherently CPU accessible ankita
2023-06-05 23:50 ` [RFC v1 1/4] qemu: add GPU memory information as object ankita
2023-06-06 15:19 ` Alex Williamson [this message]
2023-06-05 23:50 ` [RFC v1 2/4] qemu: patch guest SRAT for GPU memory ankita
2023-06-06 4:58 ` Philippe Mathieu-Daudé
2023-06-05 23:50 ` [RFC v1 3/4] qemu: patch guest DSDT " ankita
2023-06-05 23:50 ` [RFC v1 4/4] qemu: adjust queried bar size to power-of-2 ankita
2023-06-06 5:03 ` Philippe Mathieu-Daudé
2023-06-06 12:54 ` Alex Williamson
2023-06-06 14:19 ` Philippe Mathieu-Daudé
2023-06-06 14:54 ` [RFC v1 0/4] Expose GPU memory as coherently CPU accessible Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230606091901.22bc6deb.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=ankita@nvidia.com \
--cc=jgg@nvidia.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).