[PATCH] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
@ 2023-06-08 17:42 Alex Williamson
  2023-06-08 17:46 ` Alex Williamson
  2023-06-08 18:05 ` [PATCH v2] " Alex Williamson
  0 siblings, 2 replies; 7+ messages in thread
From: Alex Williamson @ 2023-06-08 17:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, clg

NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
previously reserved for use by hypervisors to implement the GPUDirect
Cliques capability.  A revised specification provides an alternate
location.  Add a config space walk to the quirk to check for conflicts,
allowing us to fall back to the new location or generate an error at the
quirk setup rather than when the real conflicting capability is added
should there be no available location.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

The specification doesn't have a public home, but I've been given
permission to post it to the list.  Therefore I'll reply to this with
the document attached and update the link in a v2 posting once that's
available in the list archive.

 hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f0147a050aaa..a10e775bbfa7 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
  * +---------------------------------+---------------------------------+
  *
  * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf
+ *
+ * Specification for Turning and later GPU architectures:
+ * <to be filled in for v2 posting>
  */
 static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v,
                                        const char *name, void *opaque,
@@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
 static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
 {
     PCIDevice *pdev = &vdev->pdev;
-    int ret, pos = 0xC8;
+    int ret, pos;
+    bool c8_conflict = false, d4_conflict = false;
+    uint8_t tmp;
 
     if (vdev->nv_gpudirect_clique == 0xFF) {
         return 0;
@@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
         return -EINVAL;
     }
 
+    /*
+     * Per the updated specification above, it's recommended to use offset
+     * D4h for Turing and later GPU architectures due to a conflict of the
+     * MSI-X capability at C8h.  We don't know how to determine the GPU
+     * architecture, instead we walk the capability chain to mark conflicts
+     * and choose one or error based on the result.
+     *
+     * NB. Cap list head in pdev->config is already cleared, read from device.
+     */
+    ret = pread(vdev->vbasedev.fd, &tmp, 1,
+                vdev->config_offset + PCI_CAPABILITY_LIST);
+    if (ret != 1 || !tmp) {
+        error_setg(errp, "NVIDIA GPUDirect Clique ID: error getting cap list");
+        return -EINVAL;
+    }
+
+    do {
+        if (tmp == 0xC8) {
+            c8_conflict = true;
+        } else if (tmp == 0xD4) {
+            d4_conflict = true;
+        }
+        tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT];
+    } while (tmp);
+
+    if (!c8_conflict) {
+        pos = 0xC8;
+    } else if (!d4_conflict) {
+        pos = 0xD4;
+    } else {
+        error_setg(errp, "NVIDIA GPUDirect Clique ID: invalid config space");
+        return -EINVAL;
+    }
+
     ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
     if (ret < 0) {
         error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
  2023-06-08 17:42 [PATCH] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques Alex Williamson
@ 2023-06-08 17:46 ` Alex Williamson
  2023-06-08 18:05 ` [PATCH v2] " Alex Williamson
  1 sibling, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2023-06-08 17:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: clg

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

On Thu,  8 Jun 2023 11:42:11 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
> previously reserved for use by hypervisors to implement the GPUDirect
> Cliques capability.  A revised specification provides an alternate
> location.  Add a config space walk to the quirk to check for conflicts,
> allowing us to fall back to the new location or generate an error at the
> quirk setup rather than when the real conflicting capability is added
> should there be no available location.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> The specification doesn't have a public home, but I've been given
> permission to post it to the list.  Therefore I'll reply to this with
> the document attached and update the link in a v2 posting once that's
> available in the list archive.

Updated spec attached.  Thanks,

Alex

[-- Attachment #2: NVIDIA GPUDirect with PCI Pass-Through Virtualization.pdf --]
[-- Type: application/pdf, Size: 168745 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
  2023-06-08 17:42 [PATCH] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques Alex Williamson
  2023-06-08 17:46 ` Alex Williamson
@ 2023-06-08 18:05 ` Alex Williamson
  2023-06-12 14:07   ` Cédric Le Goater
  1 sibling, 1 reply; 7+ messages in thread
From: Alex Williamson @ 2023-06-08 18:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Williamson, clg

NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
previously reserved for use by hypervisors to implement the GPUDirect
Cliques capability.  A revised specification provides an alternate
location.  Add a config space walk to the quirk to check for conflicts,
allowing us to fall back to the new location or generate an error at the
quirk setup rather than when the real conflicting capability is added
should there be no available location.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index f0147a050aaa..0ed2fcd53152 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
  * +---------------------------------+---------------------------------+
  *
  * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf
+ *
+ * Specification for Turning and later GPU architectures:
+ * https://lists.gnu.org/archive/html/qemu-devel/2023-06/pdf142OR4O4c2.pdf
  */
 static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v,
                                        const char *name, void *opaque,
@@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
 static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
 {
     PCIDevice *pdev = &vdev->pdev;
-    int ret, pos = 0xC8;
+    int ret, pos;
+    bool c8_conflict = false, d4_conflict = false;
+    uint8_t tmp;
 
     if (vdev->nv_gpudirect_clique == 0xFF) {
         return 0;
@@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
         return -EINVAL;
     }
 
+    /*
+     * Per the updated specification above, it's recommended to use offset
+     * D4h for Turing and later GPU architectures due to a conflict of the
+     * MSI-X capability at C8h.  We don't know how to determine the GPU
+     * architecture, instead we walk the capability chain to mark conflicts
+     * and choose one or error based on the result.
+     *
+     * NB. Cap list head in pdev->config is already cleared, read from device.
+     */
+    ret = pread(vdev->vbasedev.fd, &tmp, 1,
+                vdev->config_offset + PCI_CAPABILITY_LIST);
+    if (ret != 1 || !tmp) {
+        error_setg(errp, "NVIDIA GPUDirect Clique ID: error getting cap list");
+        return -EINVAL;
+    }
+
+    do {
+        if (tmp == 0xC8) {
+            c8_conflict = true;
+        } else if (tmp == 0xD4) {
+            d4_conflict = true;
+        }
+        tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT];
+    } while (tmp);
+
+    if (!c8_conflict) {
+        pos = 0xC8;
+    } else if (!d4_conflict) {
+        pos = 0xD4;
+    } else {
+        error_setg(errp, "NVIDIA GPUDirect Clique ID: invalid config space");
+        return -EINVAL;
+    }
+
     ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
     if (ret < 0) {
         error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
  2023-06-08 18:05 ` [PATCH v2] " Alex Williamson
@ 2023-06-12 14:07   ` Cédric Le Goater
  2023-06-12 15:05     ` Alex Williamson
  0 siblings, 1 reply; 7+ messages in thread
From: Cédric Le Goater @ 2023-06-12 14:07 UTC (permalink / raw)
  To: Alex Williamson, qemu-devel

On 6/8/23 20:05, Alex Williamson wrote:
> NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
> previously reserved for use by hypervisors to implement the GPUDirect
> Cliques capability.  A revised specification provides an alternate
> location.  Add a config space walk to the quirk to check for conflicts,
> allowing us to fall back to the new location or generate an error at the
> quirk setup rather than when the real conflicting capability is added
> should there be no available location.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>   hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> index f0147a050aaa..0ed2fcd53152 100644
> --- a/hw/vfio/pci-quirks.c
> +++ b/hw/vfio/pci-quirks.c
> @@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>    * +---------------------------------+---------------------------------+
>    *
>    * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf
> + *
> + * Specification for Turning and later GPU architectures:

s/Turning/Turing/

I will fix that.

> + * https://lists.gnu.org/archive/html/qemu-devel/2023-06/pdf142OR4O4c2.pdf
>    */
>   static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v,
>                                          const char *name, void *opaque,
> @@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
>   static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
>   {
>       PCIDevice *pdev = &vdev->pdev;
> -    int ret, pos = 0xC8;
> +    int ret, pos;
> +    bool c8_conflict = false, d4_conflict = false;
> +    uint8_t tmp;
>   
>       if (vdev->nv_gpudirect_clique == 0xFF) {
>           return 0;
> @@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
>           return -EINVAL;
>       }
>   
> +    /*
> +     * Per the updated specification above, it's recommended to use offset
> +     * D4h for Turing and later GPU architectures due to a conflict of the
> +     * MSI-X capability at C8h.  We don't know how to determine the GPU

There is a way :

   # nvidia-smi -q | grep Architecture
       Product Architecture                  : Turing

but it must be vendor specific and the proposed solution is as good.

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.

> +     * architecture, instead we walk the capability chain to mark conflicts
> +     * and choose one or error based on the result.
> +     *
> +     * NB. Cap list head in pdev->config is already cleared, read from device.
> +     */
> +    ret = pread(vdev->vbasedev.fd, &tmp, 1,
> +                vdev->config_offset + PCI_CAPABILITY_LIST);
> +    if (ret != 1 || !tmp) {
> +        error_setg(errp, "NVIDIA GPUDirect Clique ID: error getting cap list");
> +        return -EINVAL;
> +    }
> +
> +    do {
> +        if (tmp == 0xC8) {
> +            c8_conflict = true;
> +        } else if (tmp == 0xD4) {
> +            d4_conflict = true;
> +        }
> +        tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT];
> +    } while (tmp);
> +
> +    if (!c8_conflict) {
> +        pos = 0xC8;
> +    } else if (!d4_conflict) {
> +        pos = 0xD4;
> +    } else {
> +        error_setg(errp, "NVIDIA GPUDirect Clique ID: invalid config space");
> +        return -EINVAL;
> +    }
> +
>       ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
>       if (ret < 0) {
>           error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
  2023-06-12 14:07   ` Cédric Le Goater
@ 2023-06-12 15:05     ` Alex Williamson
  2023-06-14 12:37       ` Cédric Le Goater
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Williamson @ 2023-06-12 15:05 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-devel

On Mon, 12 Jun 2023 16:07:33 +0200
Cédric Le Goater <clg@redhat.com> wrote:

> On 6/8/23 20:05, Alex Williamson wrote:
> > NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
> > previously reserved for use by hypervisors to implement the GPUDirect
> > Cliques capability.  A revised specification provides an alternate
> > location.  Add a config space walk to the quirk to check for conflicts,
> > allowing us to fall back to the new location or generate an error at the
> > quirk setup rather than when the real conflicting capability is added
> > should there be no available location.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >   hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >   1 file changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> > index f0147a050aaa..0ed2fcd53152 100644
> > --- a/hw/vfio/pci-quirks.c
> > +++ b/hw/vfio/pci-quirks.c
> > @@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> >    * +---------------------------------+---------------------------------+
> >    *
> >    * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf
> > + *
> > + * Specification for Turning and later GPU architectures:  
> 
> s/Turning/Turing/
> 
> I will fix that.

Yes, thanks!
 
> > + * https://lists.gnu.org/archive/html/qemu-devel/2023-06/pdf142OR4O4c2.pdf
> >    */
> >   static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v,
> >                                          const char *name, void *opaque,
> > @@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
> >   static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
> >   {
> >       PCIDevice *pdev = &vdev->pdev;
> > -    int ret, pos = 0xC8;
> > +    int ret, pos;
> > +    bool c8_conflict = false, d4_conflict = false;
> > +    uint8_t tmp;
> >   
> >       if (vdev->nv_gpudirect_clique == 0xFF) {
> >           return 0;
> > @@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
> >           return -EINVAL;
> >       }
> >   
> > +    /*
> > +     * Per the updated specification above, it's recommended to use offset
> > +     * D4h for Turing and later GPU architectures due to a conflict of the
> > +     * MSI-X capability at C8h.  We don't know how to determine the GPU  
> 
> There is a way :
> 
>    # nvidia-smi -q | grep Architecture
>        Product Architecture                  : Turing

There are a few problems with that:

 1) nvidia-smi is a proprietary tool.

 2) Using nvidia-smi, or even the PCI IDs database, would require
    ongoing maintenance to update the string or IDs for future
    architectures.

 3) nvidia-smi requires the device to be managed by the nvidia driver,
    which becomes and chicken and egg problem when we require the
    device to be managed by a vfio compatible driver by this point.

> but it must be vendor specific and the proposed solution is as good.
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks!

Alex

> > +     * architecture, instead we walk the capability chain to mark conflicts
> > +     * and choose one or error based on the result.
> > +     *
> > +     * NB. Cap list head in pdev->config is already cleared, read from device.
> > +     */
> > +    ret = pread(vdev->vbasedev.fd, &tmp, 1,
> > +                vdev->config_offset + PCI_CAPABILITY_LIST);
> > +    if (ret != 1 || !tmp) {
> > +        error_setg(errp, "NVIDIA GPUDirect Clique ID: error getting cap list");
> > +        return -EINVAL;
> > +    }
> > +
> > +    do {
> > +        if (tmp == 0xC8) {
> > +            c8_conflict = true;
> > +        } else if (tmp == 0xD4) {
> > +            d4_conflict = true;
> > +        }
> > +        tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT];
> > +    } while (tmp);
> > +
> > +    if (!c8_conflict) {
> > +        pos = 0xC8;
> > +    } else if (!d4_conflict) {
> > +        pos = 0xD4;
> > +    } else {
> > +        error_setg(errp, "NVIDIA GPUDirect Clique ID: invalid config space");
> > +        return -EINVAL;
> > +    }
> > +
> >       ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
> >       if (ret < 0) {
> >           error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");  
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
  2023-06-12 15:05     ` Alex Williamson
@ 2023-06-14 12:37       ` Cédric Le Goater
  2023-06-14 15:01         ` Alex Williamson
  0 siblings, 1 reply; 7+ messages in thread
From: Cédric Le Goater @ 2023-06-14 12:37 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel

On 6/12/23 17:05, Alex Williamson wrote:
> On Mon, 12 Jun 2023 16:07:33 +0200
> Cédric Le Goater <clg@redhat.com> wrote:
> 
>> On 6/8/23 20:05, Alex Williamson wrote:
>>> NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
>>> previously reserved for use by hypervisors to implement the GPUDirect
>>> Cliques capability.  A revised specification provides an alternate
>>> location.  Add a config space walk to the quirk to check for conflicts,
>>> allowing us to fall back to the new location or generate an error at the
>>> quirk setup rather than when the real conflicting capability is added
>>> should there be no available location.
>>>
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>> ---
>>>    hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>>>    1 file changed, 40 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>>> index f0147a050aaa..0ed2fcd53152 100644
>>> --- a/hw/vfio/pci-quirks.c
>>> +++ b/hw/vfio/pci-quirks.c
>>> @@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
>>>     * +---------------------------------+---------------------------------+
>>>     *
>>>     * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf
>>> + *
>>> + * Specification for Turning and later GPU architectures:
>>
>> s/Turning/Turing/
>>
>> I will fix that.
> 
> Yes, thanks!
>   
>>> + * https://lists.gnu.org/archive/html/qemu-devel/2023-06/pdf142OR4O4c2.pdf
>>>     */
>>>    static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v,
>>>                                           const char *name, void *opaque,
>>> @@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
>>>    static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
>>>    {
>>>        PCIDevice *pdev = &vdev->pdev;
>>> -    int ret, pos = 0xC8;
>>> +    int ret, pos;
>>> +    bool c8_conflict = false, d4_conflict = false;
>>> +    uint8_t tmp;
>>>    
>>>        if (vdev->nv_gpudirect_clique == 0xFF) {
>>>            return 0;
>>> @@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
>>>            return -EINVAL;
>>>        }
>>>    
>>> +    /*
>>> +     * Per the updated specification above, it's recommended to use offset
>>> +     * D4h for Turing and later GPU architectures due to a conflict of the
>>> +     * MSI-X capability at C8h.  We don't know how to determine the GPU
>>
>> There is a way :
>>
>>     # nvidia-smi -q | grep Architecture
>>         Product Architecture                  : Turing
> 
> There are a few problems with that:
> 
>   1) nvidia-smi is a proprietary tool.
> 
>   2) Using nvidia-smi, or even the PCI IDs database, would require
>      ongoing maintenance to update the string or IDs for future
>      architectures.
> 
>   3) nvidia-smi requires the device to be managed by the nvidia driver,
>      which becomes and chicken and egg problem when we require the
>      device to be managed by a vfio compatible driver by this point.

For my education, could such information be exposed in a PCI vendor
specific capability ? May be it is ?

Thanks,

C.


> 
>> but it must be vendor specific and the proposed solution is as good.
>>
>> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> 
> Thanks!
> 
> Alex
> 
>>> +     * architecture, instead we walk the capability chain to mark conflicts
>>> +     * and choose one or error based on the result.
>>> +     *
>>> +     * NB. Cap list head in pdev->config is already cleared, read from device.
>>> +     */
>>> +    ret = pread(vdev->vbasedev.fd, &tmp, 1,
>>> +                vdev->config_offset + PCI_CAPABILITY_LIST);
>>> +    if (ret != 1 || !tmp) {
>>> +        error_setg(errp, "NVIDIA GPUDirect Clique ID: error getting cap list");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    do {
>>> +        if (tmp == 0xC8) {
>>> +            c8_conflict = true;
>>> +        } else if (tmp == 0xD4) {
>>> +            d4_conflict = true;
>>> +        }
>>> +        tmp = pdev->config[tmp + PCI_CAP_LIST_NEXT];
>>> +    } while (tmp);
>>> +
>>> +    if (!c8_conflict) {
>>> +        pos = 0xC8;
>>> +    } else if (!d4_conflict) {
>>> +        pos = 0xD4;
>>> +    } else {
>>> +        error_setg(errp, "NVIDIA GPUDirect Clique ID: invalid config space");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>>        ret = pci_add_capability(pdev, PCI_CAP_ID_VNDR, pos, 8, errp);
>>>        if (ret < 0) {
>>>            error_prepend(errp, "Failed to add NVIDIA GPUDirect cap: ");
>>
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
  2023-06-14 12:37       ` Cédric Le Goater
@ 2023-06-14 15:01         ` Alex Williamson
  0 siblings, 0 replies; 7+ messages in thread
From: Alex Williamson @ 2023-06-14 15:01 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: qemu-devel

On Wed, 14 Jun 2023 14:37:08 +0200
Cédric Le Goater <clg@redhat.com> wrote:

> On 6/12/23 17:05, Alex Williamson wrote:
> > On Mon, 12 Jun 2023 16:07:33 +0200
> > Cédric Le Goater <clg@redhat.com> wrote:
> >   
> >> On 6/8/23 20:05, Alex Williamson wrote:  
> >>> NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset
> >>> previously reserved for use by hypervisors to implement the GPUDirect
> >>> Cliques capability.  A revised specification provides an alternate
> >>> location.  Add a config space walk to the quirk to check for conflicts,
> >>> allowing us to fall back to the new location or generate an error at the
> >>> quirk setup rather than when the real conflicting capability is added
> >>> should there be no available location.
> >>>
> >>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >>> ---
> >>>    hw/vfio/pci-quirks.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >>>    1 file changed, 40 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
> >>> index f0147a050aaa..0ed2fcd53152 100644
> >>> --- a/hw/vfio/pci-quirks.c
> >>> +++ b/hw/vfio/pci-quirks.c
> >>> @@ -1490,6 +1490,9 @@ void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev)
> >>>     * +---------------------------------+---------------------------------+
> >>>     *
> >>>     * https://lists.gnu.org/archive/html/qemu-devel/2017-08/pdfUda5iEpgOS.pdf
> >>> + *
> >>> + * Specification for Turning and later GPU architectures:  
> >>
> >> s/Turning/Turing/
> >>
> >> I will fix that.  
> > 
> > Yes, thanks!
> >     
> >>> + * https://lists.gnu.org/archive/html/qemu-devel/2023-06/pdf142OR4O4c2.pdf
> >>>     */
> >>>    static void get_nv_gpudirect_clique_id(Object *obj, Visitor *v,
> >>>                                           const char *name, void *opaque,
> >>> @@ -1530,7 +1533,9 @@ const PropertyInfo qdev_prop_nv_gpudirect_clique = {
> >>>    static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
> >>>    {
> >>>        PCIDevice *pdev = &vdev->pdev;
> >>> -    int ret, pos = 0xC8;
> >>> +    int ret, pos;
> >>> +    bool c8_conflict = false, d4_conflict = false;
> >>> +    uint8_t tmp;
> >>>    
> >>>        if (vdev->nv_gpudirect_clique == 0xFF) {
> >>>            return 0;
> >>> @@ -1547,6 +1552,40 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp)
> >>>            return -EINVAL;
> >>>        }
> >>>    
> >>> +    /*
> >>> +     * Per the updated specification above, it's recommended to use offset
> >>> +     * D4h for Turing and later GPU architectures due to a conflict of the
> >>> +     * MSI-X capability at C8h.  We don't know how to determine the GPU  
> >>
> >> There is a way :
> >>
> >>     # nvidia-smi -q | grep Architecture
> >>         Product Architecture                  : Turing  
> > 
> > There are a few problems with that:
> > 
> >   1) nvidia-smi is a proprietary tool.
> > 
> >   2) Using nvidia-smi, or even the PCI IDs database, would require
> >      ongoing maintenance to update the string or IDs for future
> >      architectures.
> > 
> >   3) nvidia-smi requires the device to be managed by the nvidia driver,
> >      which becomes and chicken and egg problem when we require the
> >      device to be managed by a vfio compatible driver by this point.  
> 
> For my education, could such information be exposed in a PCI vendor
> specific capability ? May be it is ?

Sure, nothing technically prevents it, but the vendor has to have a
need to do so whereas NVIDIA probably has their own means to interrogate
the device to determine the architectural level or doesn't mind
maintaining PCI IDs.  Probably a bit of both.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-06-14 15:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-08 17:42 [PATCH] hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques Alex Williamson
2023-06-08 17:46 ` Alex Williamson
2023-06-08 18:05 ` [PATCH v2] " Alex Williamson
2023-06-12 14:07   ` Cédric Le Goater
2023-06-12 15:05     ` Alex Williamson
2023-06-14 12:37       ` Cédric Le Goater
2023-06-14 15:01         ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).