From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
David Gibson <david@gibson.dropbear.id.au>,
kvm-ppc@vger.kernel.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Ram Pai <linuxram@us.ibm.com>,
kvm@vger.kernel.org, Alistair Popple <alistair@popple.id.au>
Subject: Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver
Date: Thu, 7 Jun 2018 22:34:46 -0600 [thread overview]
Message-ID: <20180607223446.1278deb1@w520.home> (raw)
In-Reply-To: <b1ec37e5-e7a0-5930-edcb-08272ca841b0@ozlabs.ru>
On Fri, 8 Jun 2018 13:52:05 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 8/6/18 1:35 pm, Alex Williamson wrote:
> > On Fri, 8 Jun 2018 13:09:13 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >> On 8/6/18 3:04 am, Alex Williamson wrote:
> >>> On Thu, 7 Jun 2018 18:44:20 +1000
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >>>> index 7bddf1e..38c9475 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>>> }
> >>>> }
> >>>>
> >>>> + if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
> >>>> + pdev->device == 0x1db1 &&
> >>>> + IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {
> >>>
> >>> Can't we do better than check this based on device ID? Perhaps PCIe
> >>> capability hints at this?
> >>
> >> A normal PCI pluggable device looks like this:
> >>
> >> root@fstn3:~# sudo lspci -vs 0000:03:00.0
> >> 0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
> >> Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
> >> Flags: fast devsel, IRQ 497
> >> Memory at 3fe000000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> Memory at 200000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> Memory at 200400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> Capabilities: [60] Power Management version 3
> >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> Capabilities: [78] Express Endpoint, MSI 00
> >> Capabilities: [100] Virtual Channel
> >> Capabilities: [128] Power Budgeting <?>
> >> Capabilities: [420] Advanced Error Reporting
> >> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> Capabilities: [900] #19
> >>
> >>
> >> This is a NVLink v1 machine:
> >>
> >> aik@garrison1:~$ sudo lspci -vs 000a:01:00.0
> >> 000a:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
> >> Subsystem: NVIDIA Corporation Device 116b
> >> Flags: bus master, fast devsel, latency 0, IRQ 457
> >> Memory at 3fe300000000 (32-bit, non-prefetchable) [size=16M]
> >> Memory at 260000000000 (64-bit, prefetchable) [size=16G]
> >> Memory at 260400000000 (64-bit, prefetchable) [size=32M]
> >> Capabilities: [60] Power Management version 3
> >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> Capabilities: [78] Express Endpoint, MSI 00
> >> Capabilities: [100] Virtual Channel
> >> Capabilities: [250] Latency Tolerance Reporting
> >> Capabilities: [258] L1 PM Substates
> >> Capabilities: [128] Power Budgeting <?>
> >> Capabilities: [420] Advanced Error Reporting
> >> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> Capabilities: [900] #19
> >> Kernel driver in use: nvidia
> >> Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384
> >>
> >>
> >> This is the one the patch is for:
> >>
> >> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> >> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> >> (rev a1)
> >> Subsystem: NVIDIA Corporation Device 1212
> >> Flags: fast devsel, IRQ 82, NUMA node 8
> >> Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> Capabilities: [60] Power Management version 3
> >> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> Capabilities: [78] Express Endpoint, MSI 00
> >> Capabilities: [100] Virtual Channel
> >> Capabilities: [250] Latency Tolerance Reporting
> >> Capabilities: [258] L1 PM Substates
> >> Capabilities: [128] Power Budgeting <?>
> >> Capabilities: [420] Advanced Error Reporting
> >> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> Capabilities: [900] #19
> >> Capabilities: [ac0] #23
> >> Kernel driver in use: vfio-pci
> >>
> >>
> >> I can only see a new capability #23 which I have no idea about what it
> >> actually does - my latest PCIe spec is
> >> PCI_Express_Base_r3.1a_December7-2015.pdf and that only knows capabilities
> >> till #21, do you have any better spec? Does not seem promising anyway...
> >
> > You could just look in include/uapi/linux/pci_regs.h and see that 23
> > (0x17) is a TPH Requester capability and google for that... It's a TLP
> > processing hint related to cache processing for requests from system
> > specific interconnects. Sounds rather promising. Of course there's
> > also the vendor specific capability that might be probed if NVIDIA will
> > tell you what to look for and the init function you've implemented
> > looks for specific devicetree nodes, that I imagine you could test for
> > in a probe as well.
>
>
> This 23 is in hex:
>
> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> (rev a1)
> Subsystem: NVIDIA Corporation Device 1212
> Flags: fast devsel, IRQ 82, NUMA node 8
> Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> Capabilities: [60] Power Management version 3
> Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> Capabilities: [78] Express Endpoint, MSI 00
> Capabilities: [100] Virtual Channel
> Capabilities: [250] Latency Tolerance Reporting
> Capabilities: [258] L1 PM Substates
> Capabilities: [128] Power Budgeting <?>
> Capabilities: [420] Advanced Error Reporting
> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> Capabilities: [900] #19
> Capabilities: [ac0] #23
> Kernel driver in use: vfio-pci
>
> [aik@yc02goos ~]$ sudo lspci -vvvxxxxs 0035:03:00.0 | grep ac0
> Capabilities: [ac0 v1] #23
> ac0: 23 00 01 00 de 10 c1 00 01 00 10 00 00 00 00 00
Oops, I was thinking lspci printed unknown in decimal. Strange, it's a
shared, vendor specific capability:
https://pcisig.com/sites/default/files/specification_documents/ECN_DVSEC-2015-08-04-clean_0.pdf
We see in your dump that the vendor of this capability is 0x10de
(NVIDIA) and the ID of the capability is 0x0001. Note that NVIDIA
sponsored this ECN.
> Talking to NVIDIA is always an option :)
Really no other choice to figure out how to decode these vendor
specific capabilities, this 0x23 capability at least seems to be meant
for sharing.
> >>> Is it worthwhile to continue with assigning the device in the !ENABLED
> >>> case? For instance, maybe it would be better to provide a weak
> >>> definition of vfio_pci_nvlink2_init() that would cause us to fail here
> >>> if we don't have this device specific support enabled. I realize
> >>> you're following the example set forth for IGD, but those regions are
> >>> optional, for better or worse.
> >>
> >>
> >> The device is supposed to work even without GPU RAM passed through, this
> >> should look like NVLink v1 in this case (there used to be bugs in the
> >> driver, may be still are, have not checked for a while but there is a bug
> >> opened at NVIDIA about this and they were going to fix that), this is why I
> >> chose not to fail here.
> >
> > Ok.
> >
> >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>>> index 24ee260..2725bc8 100644
> >>>> --- a/drivers/vfio/pci/Kconfig
> >>>> +++ b/drivers/vfio/pci/Kconfig
> >>>> @@ -30,3 +30,7 @@ config VFIO_PCI_INTX
> >>>> config VFIO_PCI_IGD
> >>>> depends on VFIO_PCI
> >>>> def_bool y if X86
> >>>> +
> >>>> +config VFIO_PCI_NVLINK2
> >>>> + depends on VFIO_PCI
> >>>> + def_bool y if PPC_POWERNV
> >>>
> >>> As written, this also depends on PPC_POWERNV (or at least TCE), it's not
> >>> a portable implementation that we could re-use on X86 or ARM or any
> >>> other platform if hardware appeared for it. Can we improve that as
> >>> well to make this less POWER specific? Thanks,
> >>
> >>
> >> As I said in another mail, every P9 chip in that box has some NVLink2 logic
> >> on it so it is not even common among P9's in general and I am having hard
> >> time seeing these V100s used elsewhere in such way.
> >
> > https://www.redhat.com/archives/vfio-users/2018-May/msg00000.html
> >
> > Not much platform info, but based on the rpm mentioned, looks like an
> > x86_64 box. Thanks,
>
> Wow. Interesting. Thanks for the pointer. No advertising material actually
> says that it is P9 only or even mention P9, wiki does not say it is P9 only
> either. Hmmm...
NVIDIA's own DGX systems are Xeon-based and seem to include NVLink.
The DGX-1 definitely makes use of the SXM2 modules, up to 8 of them.
The DGX Station might be the 4x V100 SXM2 box mentioned in the link.
Thanks,
Alex
next prev parent reply other threads:[~2018-06-08 4:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-07 8:44 [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alexey Kardashevskiy
2018-06-07 8:44 ` [RFC PATCH kernel 1/5] vfio/spapr_tce: Simplify page contained test Alexey Kardashevskiy
2018-06-08 3:32 ` David Gibson
2018-06-07 8:44 ` [RFC PATCH kernel 2/5] powerpc/iommu_context: Change referencing in API Alexey Kardashevskiy
2018-06-07 8:44 ` [RFC PATCH kernel 3/5] powerpc/iommu: Do not pin memory of a memory device Alexey Kardashevskiy
2018-06-07 8:44 ` [RFC PATCH kernel 4/5] vfio_pci: Allow mapping extra regions Alexey Kardashevskiy
2018-06-07 17:04 ` Alex Williamson
2018-06-07 8:44 ` [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver Alexey Kardashevskiy
2018-06-07 17:04 ` Alex Williamson
2018-06-08 3:09 ` Alexey Kardashevskiy
2018-06-08 3:35 ` Alex Williamson
2018-06-08 3:52 ` Alexey Kardashevskiy
2018-06-08 4:34 ` Alex Williamson [this message]
2018-06-07 17:04 ` [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alex Williamson
2018-06-07 21:54 ` Benjamin Herrenschmidt
2018-06-07 22:15 ` Alex Williamson
2018-06-07 23:20 ` Benjamin Herrenschmidt
2018-06-08 0:34 ` Alex Williamson
2018-06-08 0:58 ` Benjamin Herrenschmidt
2018-06-08 1:18 ` Alex Williamson
2018-06-08 3:08 ` Alexey Kardashevskiy
2018-06-08 3:44 ` Alex Williamson
2018-06-08 4:14 ` Alexey Kardashevskiy
2018-06-08 5:03 ` Alex Williamson
2018-07-10 4:10 ` Alexey Kardashevskiy
2018-07-10 22:37 ` Alex Williamson
2018-07-11 9:26 ` Alexey Kardashevskiy
2018-07-30 8:58 ` Alexey Kardashevskiy
2018-07-30 16:29 ` Alex Williamson
2018-07-31 4:03 ` Alexey Kardashevskiy
2018-07-31 14:29 ` Alex Williamson
2018-08-01 8:37 ` Alexey Kardashevskiy
2018-08-01 16:16 ` Alex Williamson
2018-08-08 8:39 ` Alexey Kardashevskiy
2018-08-09 4:21 ` Alexey Kardashevskiy
2018-08-09 14:06 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180607223446.1278deb1@w520.home \
--to=alex.williamson@redhat.com \
--cc=aik@ozlabs.ru \
--cc=alistair@popple.id.au \
--cc=benh@kernel.crashing.org \
--cc=david@gibson.dropbear.id.au \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=linuxram@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).