Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>,
	kvm-ppc@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ram Pai <linuxram@us.ibm.com>,
	kvm@vger.kernel.org, Alistair Popple <alistair@popple.id.au>
Subject: Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver
Date: Thu, 7 Jun 2018 22:34:46 -0600	[thread overview]
Message-ID: <20180607223446.1278deb1@w520.home> (raw)
In-Reply-To: <b1ec37e5-e7a0-5930-edcb-08272ca841b0@ozlabs.ru>

On Fri, 8 Jun 2018 13:52:05 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 1:35 pm, Alex Williamson wrote:
> > On Fri, 8 Jun 2018 13:09:13 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >> On 8/6/18 3:04 am, Alex Williamson wrote:  
> >>> On Thu,  7 Jun 2018 18:44:20 +1000
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >>>> index 7bddf1e..38c9475 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>>>  		}
> >>>>  	}
> >>>>  
> >>>> +	if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
> >>>> +	    pdev->device == 0x1db1 &&
> >>>> +	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {    
> >>>
> >>> Can't we do better than check this based on device ID?  Perhaps PCIe
> >>> capability hints at this?    
> >>
> >> A normal PCI pluggable device looks like this:
> >>
> >> root@fstn3:~# sudo lspci -vs 0000:03:00.0
> >> 0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
> >> 	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
> >> 	Flags: fast devsel, IRQ 497
> >> 	Memory at 3fe000000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> 	Memory at 200000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> 	Memory at 200400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >>
> >>
> >> This is a NVLink v1 machine:
> >>
> >> aik@garrison1:~$ sudo lspci -vs 000a:01:00.0
> >> 000a:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 116b
> >> 	Flags: bus master, fast devsel, latency 0, IRQ 457
> >> 	Memory at 3fe300000000 (32-bit, non-prefetchable) [size=16M]
> >> 	Memory at 260000000000 (64-bit, prefetchable) [size=16G]
> >> 	Memory at 260400000000 (64-bit, prefetchable) [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >> 	Kernel driver in use: nvidia
> >> 	Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384
> >>
> >>
> >> This is the one the patch is for:
> >>
> >> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> >> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> >> (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 1212
> >> 	Flags: fast devsel, IRQ 82, NUMA node 8
> >> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >> 	Capabilities: [ac0] #23
> >> 	Kernel driver in use: vfio-pci
> >>
> >>
> >> I can only see a new capability #23 which I have no idea about what it
> >> actually does - my latest PCIe spec is
> >> PCI_Express_Base_r3.1a_December7-2015.pdf and that only knows capabilities
> >> till #21, do you have any better spec? Does not seem promising anyway...  
> > 
> > You could just look in include/uapi/linux/pci_regs.h and see that 23
> > (0x17) is a TPH Requester capability and google for that...  It's a TLP
> > processing hint related to cache processing for requests from system
> > specific interconnects.  Sounds rather promising.  Of course there's
> > also the vendor specific capability that might be probed if NVIDIA will
> > tell you what to look for and the init function you've implemented
> > looks for specific devicetree nodes, that I imagine you could test for
> > in a probe as well.  
> 
> 
> This 23 is in hex:
> 
> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> (rev a1)
> 	Subsystem: NVIDIA Corporation Device 1212
> 	Flags: fast devsel, IRQ 82, NUMA node 8
> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> 	Capabilities: [60] Power Management version 3
> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> 	Capabilities: [78] Express Endpoint, MSI 00
> 	Capabilities: [100] Virtual Channel
> 	Capabilities: [250] Latency Tolerance Reporting
> 	Capabilities: [258] L1 PM Substates
> 	Capabilities: [128] Power Budgeting <?>
> 	Capabilities: [420] Advanced Error Reporting
> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> 	Capabilities: [900] #19
> 	Capabilities: [ac0] #23
> 	Kernel driver in use: vfio-pci
> 
> [aik@yc02goos ~]$ sudo lspci -vvvxxxxs 0035:03:00.0 | grep ac0
> 	Capabilities: [ac0 v1] #23
> ac0: 23 00 01 00 de 10 c1 00 01 00 10 00 00 00 00 00

Oops, I was thinking lspci printed unknown in decimal.  Strange, it's a
shared, vendor specific capability:

https://pcisig.com/sites/default/files/specification_documents/ECN_DVSEC-2015-08-04-clean_0.pdf

We see in your dump that the vendor of this capability is 0x10de
(NVIDIA) and the ID of the capability is 0x0001.  Note that NVIDIA
sponsored this ECN.

> Talking to NVIDIA is always an option :)

Really no other choice to figure out how to decode these vendor
specific capabilities, this 0x23 capability at least seems to be meant
for sharing.

> >>> Is it worthwhile to continue with assigning the device in the !ENABLED
> >>> case?  For instance, maybe it would be better to provide a weak
> >>> definition of vfio_pci_nvlink2_init() that would cause us to fail here
> >>> if we don't have this device specific support enabled.  I realize
> >>> you're following the example set forth for IGD, but those regions are
> >>> optional, for better or worse.    
> >>
> >>
> >> The device is supposed to work even without GPU RAM passed through, this
> >> should look like NVLink v1 in this case (there used to be bugs in the
> >> driver, may be still are, have not checked for a while but there is a bug
> >> opened at NVIDIA about this and they were going to fix that), this is why I
> >> chose not to fail here.  
> > 
> > Ok.
> >   
> >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>>> index 24ee260..2725bc8 100644
> >>>> --- a/drivers/vfio/pci/Kconfig
> >>>> +++ b/drivers/vfio/pci/Kconfig
> >>>> @@ -30,3 +30,7 @@ config VFIO_PCI_INTX
> >>>>  config VFIO_PCI_IGD
> >>>>  	depends on VFIO_PCI
> >>>>  	def_bool y if X86
> >>>> +
> >>>> +config VFIO_PCI_NVLINK2
> >>>> +	depends on VFIO_PCI
> >>>> +	def_bool y if PPC_POWERNV    
> >>>
> >>> As written, this also depends on PPC_POWERNV (or at least TCE), it's not
> >>> a portable implementation that we could re-use on X86 or ARM or any
> >>> other platform if hardware appeared for it.  Can we improve that as
> >>> well to make this less POWER specific?  Thanks,    
> >>
> >>
> >> As I said in another mail, every P9 chip in that box has some NVLink2 logic
> >> on it so it is not even common among P9's in general and I am having hard
> >> time seeing these V100s used elsewhere in such way.  
> > 
> > https://www.redhat.com/archives/vfio-users/2018-May/msg00000.html
> > 
> > Not much platform info, but based on the rpm mentioned, looks like an
> > x86_64 box.  Thanks,  
> 
> Wow. Interesting. Thanks for the pointer. No advertising material actually
> says that it is P9 only or even mention P9, wiki does not say it is P9 only
> either. Hmmm...

NVIDIA's own DGX systems are Xeon-based and seem to include NVLink.
The DGX-1 definitely makes use of the SXM2 modules, up to 8 of them.
The DGX Station might be the 4x V100 SXM2 box mentioned in the link.
Thanks,

Alex

next prev parent reply	other threads:[~2018-06-08  4:34 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07  8:44 [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 1/5] vfio/spapr_tce: Simplify page contained test Alexey Kardashevskiy
2018-06-08  3:32   ` David Gibson
2018-06-07  8:44 ` [RFC PATCH kernel 2/5] powerpc/iommu_context: Change referencing in API Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 3/5] powerpc/iommu: Do not pin memory of a memory device Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 4/5] vfio_pci: Allow mapping extra regions Alexey Kardashevskiy
2018-06-07 17:04   ` Alex Williamson
2018-06-07  8:44 ` [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver Alexey Kardashevskiy
2018-06-07 17:04   ` Alex Williamson
2018-06-08  3:09     ` Alexey Kardashevskiy
2018-06-08  3:35       ` Alex Williamson
2018-06-08  3:52         ` Alexey Kardashevskiy
2018-06-08  4:34           ` Alex Williamson [this message]
2018-06-07 17:04 ` [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alex Williamson
2018-06-07 21:54   ` Benjamin Herrenschmidt
2018-06-07 22:15     ` Alex Williamson
2018-06-07 23:20       ` Benjamin Herrenschmidt
2018-06-08  0:34         ` Alex Williamson
2018-06-08  0:58           ` Benjamin Herrenschmidt
2018-06-08  1:18             ` Alex Williamson
2018-06-08  3:08       ` Alexey Kardashevskiy
2018-06-08  3:44         ` Alex Williamson
2018-06-08  4:14           ` Alexey Kardashevskiy
2018-06-08  5:03             ` Alex Williamson
2018-07-10  4:10               ` Alexey Kardashevskiy
2018-07-10 22:37                 ` Alex Williamson
2018-07-11  9:26                   ` Alexey Kardashevskiy
2018-07-30  8:58                     ` Alexey Kardashevskiy
2018-07-30 16:29                       ` Alex Williamson
2018-07-31  4:03                         ` Alexey Kardashevskiy
2018-07-31 14:29                           ` Alex Williamson
2018-08-01  8:37                             ` Alexey Kardashevskiy
2018-08-01 16:16                               ` Alex Williamson
2018-08-08  8:39                                 ` Alexey Kardashevskiy
2018-08-09  4:21                                   ` Alexey Kardashevskiy
2018-08-09 14:06                                     ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180607223446.1278deb1@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=linuxram@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).