All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: kvm@vger.kernel.org, Ram Pai <linuxram@us.ibm.com>,
	kvm-ppc@vger.kernel.org, Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver
Date: Fri, 08 Jun 2018 04:34:46 +0000	[thread overview]
Message-ID: <20180607223446.1278deb1@w520.home> (raw)
In-Reply-To: <b1ec37e5-e7a0-5930-edcb-08272ca841b0@ozlabs.ru>

On Fri, 8 Jun 2018 13:52:05 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 1:35 pm, Alex Williamson wrote:
> > On Fri, 8 Jun 2018 13:09:13 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >> On 8/6/18 3:04 am, Alex Williamson wrote:  
> >>> On Thu,  7 Jun 2018 18:44:20 +1000
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >>>> index 7bddf1e..38c9475 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>>>  		}
> >>>>  	}
> >>>>  
> >>>> +	if (pdev->vendor = PCI_VENDOR_ID_NVIDIA &&
> >>>> +	    pdev->device = 0x1db1 &&
> >>>> +	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {    
> >>>
> >>> Can't we do better than check this based on device ID?  Perhaps PCIe
> >>> capability hints at this?    
> >>
> >> A normal PCI pluggable device looks like this:
> >>
> >> root@fstn3:~# sudo lspci -vs 0000:03:00.0
> >> 0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
> >> 	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
> >> 	Flags: fast devsel, IRQ 497
> >> 	Memory at 3fe000000000 (32-bit, non-prefetchable) [disabled] [size\x16M]
> >> 	Memory at 200000000000 (64-bit, prefetchable) [disabled] [size\x16G]
> >> 	Memory at 200400000000 (64-bit, prefetchable) [disabled] [size2M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID\001 Rev=1 Len\x024 <?>
> >> 	Capabilities: [900] #19
> >>
> >>
> >> This is a NVLink v1 machine:
> >>
> >> aik@garrison1:~$ sudo lspci -vs 000a:01:00.0
> >> 000a:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 116b
> >> 	Flags: bus master, fast devsel, latency 0, IRQ 457
> >> 	Memory at 3fe300000000 (32-bit, non-prefetchable) [size\x16M]
> >> 	Memory at 260000000000 (64-bit, prefetchable) [size\x16G]
> >> 	Memory at 260400000000 (64-bit, prefetchable) [size2M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID\001 Rev=1 Len\x024 <?>
> >> 	Capabilities: [900] #19
> >> 	Kernel driver in use: nvidia
> >> 	Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384
> >>
> >>
> >> This is the one the patch is for:
> >>
> >> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> >> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> >> (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 1212
> >> 	Flags: fast devsel, IRQ 82, NUMA node 8
> >> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size\x16M]
> >> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size\x16G]
> >> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size2M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID\001 Rev=1 Len\x024 <?>
> >> 	Capabilities: [900] #19
> >> 	Capabilities: [ac0] #23
> >> 	Kernel driver in use: vfio-pci
> >>
> >>
> >> I can only see a new capability #23 which I have no idea about what it
> >> actually does - my latest PCIe spec is
> >> PCI_Express_Base_r3.1a_December7-2015.pdf and that only knows capabilities
> >> till #21, do you have any better spec? Does not seem promising anyway...  
> > 
> > You could just look in include/uapi/linux/pci_regs.h and see that 23
> > (0x17) is a TPH Requester capability and google for that...  It's a TLP
> > processing hint related to cache processing for requests from system
> > specific interconnects.  Sounds rather promising.  Of course there's
> > also the vendor specific capability that might be probed if NVIDIA will
> > tell you what to look for and the init function you've implemented
> > looks for specific devicetree nodes, that I imagine you could test for
> > in a probe as well.  
> 
> 
> This 23 is in hex:
> 
> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> (rev a1)
> 	Subsystem: NVIDIA Corporation Device 1212
> 	Flags: fast devsel, IRQ 82, NUMA node 8
> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size\x16M]
> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size\x16G]
> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size2M]
> 	Capabilities: [60] Power Management version 3
> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> 	Capabilities: [78] Express Endpoint, MSI 00
> 	Capabilities: [100] Virtual Channel
> 	Capabilities: [250] Latency Tolerance Reporting
> 	Capabilities: [258] L1 PM Substates
> 	Capabilities: [128] Power Budgeting <?>
> 	Capabilities: [420] Advanced Error Reporting
> 	Capabilities: [600] Vendor Specific Information: ID\001 Rev=1 Len\x024 <?>
> 	Capabilities: [900] #19
> 	Capabilities: [ac0] #23
> 	Kernel driver in use: vfio-pci
> 
> [aik@yc02goos ~]$ sudo lspci -vvvxxxxs 0035:03:00.0 | grep ac0
> 	Capabilities: [ac0 v1] #23
> ac0: 23 00 01 00 de 10 c1 00 01 00 10 00 00 00 00 00

Oops, I was thinking lspci printed unknown in decimal.  Strange, it's a
shared, vendor specific capability:

https://pcisig.com/sites/default/files/specification_documents/ECN_DVSEC-2015-08-04-clean_0.pdf

We see in your dump that the vendor of this capability is 0x10de
(NVIDIA) and the ID of the capability is 0x0001.  Note that NVIDIA
sponsored this ECN.

> Talking to NVIDIA is always an option :)

Really no other choice to figure out how to decode these vendor
specific capabilities, this 0x23 capability at least seems to be meant
for sharing.

> >>> Is it worthwhile to continue with assigning the device in the !ENABLED
> >>> case?  For instance, maybe it would be better to provide a weak
> >>> definition of vfio_pci_nvlink2_init() that would cause us to fail here
> >>> if we don't have this device specific support enabled.  I realize
> >>> you're following the example set forth for IGD, but those regions are
> >>> optional, for better or worse.    
> >>
> >>
> >> The device is supposed to work even without GPU RAM passed through, this
> >> should look like NVLink v1 in this case (there used to be bugs in the
> >> driver, may be still are, have not checked for a while but there is a bug
> >> opened at NVIDIA about this and they were going to fix that), this is why I
> >> chose not to fail here.  
> > 
> > Ok.
> >   
> >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>>> index 24ee260..2725bc8 100644
> >>>> --- a/drivers/vfio/pci/Kconfig
> >>>> +++ b/drivers/vfio/pci/Kconfig
> >>>> @@ -30,3 +30,7 @@ config VFIO_PCI_INTX
> >>>>  config VFIO_PCI_IGD
> >>>>  	depends on VFIO_PCI
> >>>>  	def_bool y if X86
> >>>> +
> >>>> +config VFIO_PCI_NVLINK2
> >>>> +	depends on VFIO_PCI
> >>>> +	def_bool y if PPC_POWERNV    
> >>>
> >>> As written, this also depends on PPC_POWERNV (or at least TCE), it's not
> >>> a portable implementation that we could re-use on X86 or ARM or any
> >>> other platform if hardware appeared for it.  Can we improve that as
> >>> well to make this less POWER specific?  Thanks,    
> >>
> >>
> >> As I said in another mail, every P9 chip in that box has some NVLink2 logic
> >> on it so it is not even common among P9's in general and I am having hard
> >> time seeing these V100s used elsewhere in such way.  
> > 
> > https://www.redhat.com/archives/vfio-users/2018-May/msg00000.html
> > 
> > Not much platform info, but based on the rpm mentioned, looks like an
> > x86_64 box.  Thanks,  
> 
> Wow. Interesting. Thanks for the pointer. No advertising material actually
> says that it is P9 only or even mention P9, wiki does not say it is P9 only
> either. Hmmm...

NVIDIA's own DGX systems are Xeon-based and seem to include NVLink.
The DGX-1 definitely makes use of the SXM2 modules, up to 8 of them.
The DGX Station might be the 4x V100 SXM2 box mentioned in the link.
Thanks,

Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>,
	kvm-ppc@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ram Pai <linuxram@us.ibm.com>,
	kvm@vger.kernel.org, Alistair Popple <alistair@popple.id.au>
Subject: Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver
Date: Thu, 7 Jun 2018 22:34:46 -0600	[thread overview]
Message-ID: <20180607223446.1278deb1@w520.home> (raw)
In-Reply-To: <b1ec37e5-e7a0-5930-edcb-08272ca841b0@ozlabs.ru>

On Fri, 8 Jun 2018 13:52:05 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 1:35 pm, Alex Williamson wrote:
> > On Fri, 8 Jun 2018 13:09:13 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >> On 8/6/18 3:04 am, Alex Williamson wrote:  
> >>> On Thu,  7 Jun 2018 18:44:20 +1000
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >>>> index 7bddf1e..38c9475 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>>>  		}
> >>>>  	}
> >>>>  
> >>>> +	if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
> >>>> +	    pdev->device == 0x1db1 &&
> >>>> +	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {    
> >>>
> >>> Can't we do better than check this based on device ID?  Perhaps PCIe
> >>> capability hints at this?    
> >>
> >> A normal PCI pluggable device looks like this:
> >>
> >> root@fstn3:~# sudo lspci -vs 0000:03:00.0
> >> 0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
> >> 	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
> >> 	Flags: fast devsel, IRQ 497
> >> 	Memory at 3fe000000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> 	Memory at 200000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> 	Memory at 200400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >>
> >>
> >> This is a NVLink v1 machine:
> >>
> >> aik@garrison1:~$ sudo lspci -vs 000a:01:00.0
> >> 000a:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 116b
> >> 	Flags: bus master, fast devsel, latency 0, IRQ 457
> >> 	Memory at 3fe300000000 (32-bit, non-prefetchable) [size=16M]
> >> 	Memory at 260000000000 (64-bit, prefetchable) [size=16G]
> >> 	Memory at 260400000000 (64-bit, prefetchable) [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >> 	Kernel driver in use: nvidia
> >> 	Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384
> >>
> >>
> >> This is the one the patch is for:
> >>
> >> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> >> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> >> (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 1212
> >> 	Flags: fast devsel, IRQ 82, NUMA node 8
> >> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >> 	Capabilities: [ac0] #23
> >> 	Kernel driver in use: vfio-pci
> >>
> >>
> >> I can only see a new capability #23 which I have no idea about what it
> >> actually does - my latest PCIe spec is
> >> PCI_Express_Base_r3.1a_December7-2015.pdf and that only knows capabilities
> >> till #21, do you have any better spec? Does not seem promising anyway...  
> > 
> > You could just look in include/uapi/linux/pci_regs.h and see that 23
> > (0x17) is a TPH Requester capability and google for that...  It's a TLP
> > processing hint related to cache processing for requests from system
> > specific interconnects.  Sounds rather promising.  Of course there's
> > also the vendor specific capability that might be probed if NVIDIA will
> > tell you what to look for and the init function you've implemented
> > looks for specific devicetree nodes, that I imagine you could test for
> > in a probe as well.  
> 
> 
> This 23 is in hex:
> 
> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> (rev a1)
> 	Subsystem: NVIDIA Corporation Device 1212
> 	Flags: fast devsel, IRQ 82, NUMA node 8
> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> 	Capabilities: [60] Power Management version 3
> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> 	Capabilities: [78] Express Endpoint, MSI 00
> 	Capabilities: [100] Virtual Channel
> 	Capabilities: [250] Latency Tolerance Reporting
> 	Capabilities: [258] L1 PM Substates
> 	Capabilities: [128] Power Budgeting <?>
> 	Capabilities: [420] Advanced Error Reporting
> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> 	Capabilities: [900] #19
> 	Capabilities: [ac0] #23
> 	Kernel driver in use: vfio-pci
> 
> [aik@yc02goos ~]$ sudo lspci -vvvxxxxs 0035:03:00.0 | grep ac0
> 	Capabilities: [ac0 v1] #23
> ac0: 23 00 01 00 de 10 c1 00 01 00 10 00 00 00 00 00

Oops, I was thinking lspci printed unknown in decimal.  Strange, it's a
shared, vendor specific capability:

https://pcisig.com/sites/default/files/specification_documents/ECN_DVSEC-2015-08-04-clean_0.pdf

We see in your dump that the vendor of this capability is 0x10de
(NVIDIA) and the ID of the capability is 0x0001.  Note that NVIDIA
sponsored this ECN.

> Talking to NVIDIA is always an option :)

Really no other choice to figure out how to decode these vendor
specific capabilities, this 0x23 capability at least seems to be meant
for sharing.

> >>> Is it worthwhile to continue with assigning the device in the !ENABLED
> >>> case?  For instance, maybe it would be better to provide a weak
> >>> definition of vfio_pci_nvlink2_init() that would cause us to fail here
> >>> if we don't have this device specific support enabled.  I realize
> >>> you're following the example set forth for IGD, but those regions are
> >>> optional, for better or worse.    
> >>
> >>
> >> The device is supposed to work even without GPU RAM passed through, this
> >> should look like NVLink v1 in this case (there used to be bugs in the
> >> driver, may be still are, have not checked for a while but there is a bug
> >> opened at NVIDIA about this and they were going to fix that), this is why I
> >> chose not to fail here.  
> > 
> > Ok.
> >   
> >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>>> index 24ee260..2725bc8 100644
> >>>> --- a/drivers/vfio/pci/Kconfig
> >>>> +++ b/drivers/vfio/pci/Kconfig
> >>>> @@ -30,3 +30,7 @@ config VFIO_PCI_INTX
> >>>>  config VFIO_PCI_IGD
> >>>>  	depends on VFIO_PCI
> >>>>  	def_bool y if X86
> >>>> +
> >>>> +config VFIO_PCI_NVLINK2
> >>>> +	depends on VFIO_PCI
> >>>> +	def_bool y if PPC_POWERNV    
> >>>
> >>> As written, this also depends on PPC_POWERNV (or at least TCE), it's not
> >>> a portable implementation that we could re-use on X86 or ARM or any
> >>> other platform if hardware appeared for it.  Can we improve that as
> >>> well to make this less POWER specific?  Thanks,    
> >>
> >>
> >> As I said in another mail, every P9 chip in that box has some NVLink2 logic
> >> on it so it is not even common among P9's in general and I am having hard
> >> time seeing these V100s used elsewhere in such way.  
> > 
> > https://www.redhat.com/archives/vfio-users/2018-May/msg00000.html
> > 
> > Not much platform info, but based on the rpm mentioned, looks like an
> > x86_64 box.  Thanks,  
> 
> Wow. Interesting. Thanks for the pointer. No advertising material actually
> says that it is P9 only or even mention P9, wiki does not say it is P9 only
> either. Hmmm...

NVIDIA's own DGX systems are Xeon-based and seem to include NVLink.
The DGX-1 definitely makes use of the SXM2 modules, up to 8 of them.
The DGX Station might be the 4x V100 SXM2 box mentioned in the link.
Thanks,

Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: kvm@vger.kernel.org, Ram Pai <linuxram@us.ibm.com>,
	kvm-ppc@vger.kernel.org, Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver
Date: Thu, 7 Jun 2018 22:34:46 -0600	[thread overview]
Message-ID: <20180607223446.1278deb1@w520.home> (raw)
In-Reply-To: <b1ec37e5-e7a0-5930-edcb-08272ca841b0@ozlabs.ru>

On Fri, 8 Jun 2018 13:52:05 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 1:35 pm, Alex Williamson wrote:
> > On Fri, 8 Jun 2018 13:09:13 +1000
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >> On 8/6/18 3:04 am, Alex Williamson wrote:  
> >>> On Thu,  7 Jun 2018 18:44:20 +1000
> >>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:  
> >>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> >>>> index 7bddf1e..38c9475 100644
> >>>> --- a/drivers/vfio/pci/vfio_pci.c
> >>>> +++ b/drivers/vfio/pci/vfio_pci.c
> >>>> @@ -306,6 +306,15 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
> >>>>  		}
> >>>>  	}
> >>>>  
> >>>> +	if (pdev->vendor == PCI_VENDOR_ID_NVIDIA &&
> >>>> +	    pdev->device == 0x1db1 &&
> >>>> +	    IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) {    
> >>>
> >>> Can't we do better than check this based on device ID?  Perhaps PCIe
> >>> capability hints at this?    
> >>
> >> A normal PCI pluggable device looks like this:
> >>
> >> root@fstn3:~# sudo lspci -vs 0000:03:00.0
> >> 0000:03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
> >> 	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
> >> 	Flags: fast devsel, IRQ 497
> >> 	Memory at 3fe000000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> 	Memory at 200000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> 	Memory at 200400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >>
> >>
> >> This is a NVLink v1 machine:
> >>
> >> aik@garrison1:~$ sudo lspci -vs 000a:01:00.0
> >> 000a:01:00.0 3D controller: NVIDIA Corporation Device 15fe (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 116b
> >> 	Flags: bus master, fast devsel, latency 0, IRQ 457
> >> 	Memory at 3fe300000000 (32-bit, non-prefetchable) [size=16M]
> >> 	Memory at 260000000000 (64-bit, prefetchable) [size=16G]
> >> 	Memory at 260400000000 (64-bit, prefetchable) [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >> 	Kernel driver in use: nvidia
> >> 	Kernel modules: nvidiafb, nouveau, nvidia_384_drm, nvidia_384
> >>
> >>
> >> This is the one the patch is for:
> >>
> >> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> >> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> >> (rev a1)
> >> 	Subsystem: NVIDIA Corporation Device 1212
> >> 	Flags: fast devsel, IRQ 82, NUMA node 8
> >> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> >> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> >> 	Capabilities: [60] Power Management version 3
> >> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> >> 	Capabilities: [78] Express Endpoint, MSI 00
> >> 	Capabilities: [100] Virtual Channel
> >> 	Capabilities: [250] Latency Tolerance Reporting
> >> 	Capabilities: [258] L1 PM Substates
> >> 	Capabilities: [128] Power Budgeting <?>
> >> 	Capabilities: [420] Advanced Error Reporting
> >> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> >> 	Capabilities: [900] #19
> >> 	Capabilities: [ac0] #23
> >> 	Kernel driver in use: vfio-pci
> >>
> >>
> >> I can only see a new capability #23 which I have no idea about what it
> >> actually does - my latest PCIe spec is
> >> PCI_Express_Base_r3.1a_December7-2015.pdf and that only knows capabilities
> >> till #21, do you have any better spec? Does not seem promising anyway...  
> > 
> > You could just look in include/uapi/linux/pci_regs.h and see that 23
> > (0x17) is a TPH Requester capability and google for that...  It's a TLP
> > processing hint related to cache processing for requests from system
> > specific interconnects.  Sounds rather promising.  Of course there's
> > also the vendor specific capability that might be probed if NVIDIA will
> > tell you what to look for and the init function you've implemented
> > looks for specific devicetree nodes, that I imagine you could test for
> > in a probe as well.  
> 
> 
> This 23 is in hex:
> 
> [aik@yc02goos ~]$ sudo lspci -vs 0035:03:00.0
> 0035:03:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2]
> (rev a1)
> 	Subsystem: NVIDIA Corporation Device 1212
> 	Flags: fast devsel, IRQ 82, NUMA node 8
> 	Memory at 620c280000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> 	Memory at 6228000000000 (64-bit, prefetchable) [disabled] [size=16G]
> 	Memory at 6228400000000 (64-bit, prefetchable) [disabled] [size=32M]
> 	Capabilities: [60] Power Management version 3
> 	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
> 	Capabilities: [78] Express Endpoint, MSI 00
> 	Capabilities: [100] Virtual Channel
> 	Capabilities: [250] Latency Tolerance Reporting
> 	Capabilities: [258] L1 PM Substates
> 	Capabilities: [128] Power Budgeting <?>
> 	Capabilities: [420] Advanced Error Reporting
> 	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
> 	Capabilities: [900] #19
> 	Capabilities: [ac0] #23
> 	Kernel driver in use: vfio-pci
> 
> [aik@yc02goos ~]$ sudo lspci -vvvxxxxs 0035:03:00.0 | grep ac0
> 	Capabilities: [ac0 v1] #23
> ac0: 23 00 01 00 de 10 c1 00 01 00 10 00 00 00 00 00

Oops, I was thinking lspci printed unknown in decimal.  Strange, it's a
shared, vendor specific capability:

https://pcisig.com/sites/default/files/specification_documents/ECN_DVSEC-2015-08-04-clean_0.pdf

We see in your dump that the vendor of this capability is 0x10de
(NVIDIA) and the ID of the capability is 0x0001.  Note that NVIDIA
sponsored this ECN.

> Talking to NVIDIA is always an option :)

Really no other choice to figure out how to decode these vendor
specific capabilities, this 0x23 capability at least seems to be meant
for sharing.

> >>> Is it worthwhile to continue with assigning the device in the !ENABLED
> >>> case?  For instance, maybe it would be better to provide a weak
> >>> definition of vfio_pci_nvlink2_init() that would cause us to fail here
> >>> if we don't have this device specific support enabled.  I realize
> >>> you're following the example set forth for IGD, but those regions are
> >>> optional, for better or worse.    
> >>
> >>
> >> The device is supposed to work even without GPU RAM passed through, this
> >> should look like NVLink v1 in this case (there used to be bugs in the
> >> driver, may be still are, have not checked for a while but there is a bug
> >> opened at NVIDIA about this and they were going to fix that), this is why I
> >> chose not to fail here.  
> > 
> > Ok.
> >   
> >>>> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> >>>> index 24ee260..2725bc8 100644
> >>>> --- a/drivers/vfio/pci/Kconfig
> >>>> +++ b/drivers/vfio/pci/Kconfig
> >>>> @@ -30,3 +30,7 @@ config VFIO_PCI_INTX
> >>>>  config VFIO_PCI_IGD
> >>>>  	depends on VFIO_PCI
> >>>>  	def_bool y if X86
> >>>> +
> >>>> +config VFIO_PCI_NVLINK2
> >>>> +	depends on VFIO_PCI
> >>>> +	def_bool y if PPC_POWERNV    
> >>>
> >>> As written, this also depends on PPC_POWERNV (or at least TCE), it's not
> >>> a portable implementation that we could re-use on X86 or ARM or any
> >>> other platform if hardware appeared for it.  Can we improve that as
> >>> well to make this less POWER specific?  Thanks,    
> >>
> >>
> >> As I said in another mail, every P9 chip in that box has some NVLink2 logic
> >> on it so it is not even common among P9's in general and I am having hard
> >> time seeing these V100s used elsewhere in such way.  
> > 
> > https://www.redhat.com/archives/vfio-users/2018-May/msg00000.html
> > 
> > Not much platform info, but based on the rpm mentioned, looks like an
> > x86_64 box.  Thanks,  
> 
> Wow. Interesting. Thanks for the pointer. No advertising material actually
> says that it is P9 only or even mention P9, wiki does not say it is P9 only
> either. Hmmm...

NVIDIA's own DGX systems are Xeon-based and seem to include NVLink.
The DGX-1 definitely makes use of the SXM2 modules, up to 8 of them.
The DGX Station might be the 4x V100 SXM2 box mentioned in the link.
Thanks,

Alex

  reply	other threads:[~2018-06-08  4:34 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07  8:44 [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alexey Kardashevskiy
2018-06-07  8:44 ` Alexey Kardashevskiy
2018-06-07  8:44 ` Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 1/5] vfio/spapr_tce: Simplify page contained test Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-08  3:32   ` David Gibson
2018-06-08  3:32     ` David Gibson
2018-06-08  3:32     ` David Gibson
2018-06-07  8:44 ` [RFC PATCH kernel 2/5] powerpc/iommu_context: Change referencing in API Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 3/5] powerpc/iommu: Do not pin memory of a memory device Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 4/5] vfio_pci: Allow mapping extra regions Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07 17:04   ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-07  8:44 ` [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07 17:04   ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-08  3:09     ` Alexey Kardashevskiy
2018-06-08  3:09       ` Alexey Kardashevskiy
2018-06-08  3:09       ` Alexey Kardashevskiy
2018-06-08  3:35       ` Alex Williamson
2018-06-08  3:35         ` Alex Williamson
2018-06-08  3:35         ` Alex Williamson
2018-06-08  3:52         ` Alexey Kardashevskiy
2018-06-08  3:52           ` Alexey Kardashevskiy
2018-06-08  3:52           ` Alexey Kardashevskiy
2018-06-08  4:34           ` Alex Williamson [this message]
2018-06-08  4:34             ` Alex Williamson
2018-06-08  4:34             ` Alex Williamson
2018-06-07 17:04 ` [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alex Williamson
2018-06-07 17:04   ` Alex Williamson
2018-06-07 17:04   ` Alex Williamson
2018-06-07 21:54   ` Benjamin Herrenschmidt
2018-06-07 21:54     ` Benjamin Herrenschmidt
2018-06-07 21:54     ` Benjamin Herrenschmidt
2018-06-07 22:15     ` Alex Williamson
2018-06-07 22:15       ` Alex Williamson
2018-06-07 22:15       ` Alex Williamson
2018-06-07 23:20       ` Benjamin Herrenschmidt
2018-06-07 23:20         ` Benjamin Herrenschmidt
2018-06-07 23:20         ` Benjamin Herrenschmidt
2018-06-08  0:34         ` Alex Williamson
2018-06-08  0:34           ` Alex Williamson
2018-06-08  0:34           ` Alex Williamson
2018-06-08  0:58           ` Benjamin Herrenschmidt
2018-06-08  0:58             ` Benjamin Herrenschmidt
2018-06-08  0:58             ` Benjamin Herrenschmidt
2018-06-08  1:18             ` Alex Williamson
2018-06-08  1:18               ` Alex Williamson
2018-06-08  1:18               ` Alex Williamson
2018-06-08  3:08       ` Alexey Kardashevskiy
2018-06-08  3:08         ` Alexey Kardashevskiy
2018-06-08  3:08         ` Alexey Kardashevskiy
2018-06-08  3:44         ` Alex Williamson
2018-06-08  3:44           ` Alex Williamson
2018-06-08  3:44           ` Alex Williamson
2018-06-08  4:14           ` Alexey Kardashevskiy
2018-06-08  4:14             ` Alexey Kardashevskiy
2018-06-08  4:14             ` Alexey Kardashevskiy
2018-06-08  5:03             ` Alex Williamson
2018-06-08  5:03               ` Alex Williamson
2018-06-08  5:03               ` Alex Williamson
2018-07-10  4:10               ` Alexey Kardashevskiy
2018-07-10  4:10                 ` Alexey Kardashevskiy
2018-07-10  4:10                 ` Alexey Kardashevskiy
2018-07-10 22:37                 ` Alex Williamson
2018-07-10 22:37                   ` Alex Williamson
2018-07-10 22:37                   ` Alex Williamson
2018-07-11  9:26                   ` Alexey Kardashevskiy
2018-07-11  9:26                     ` Alexey Kardashevskiy
2018-07-11  9:26                     ` Alexey Kardashevskiy
2018-07-30  8:58                     ` Alexey Kardashevskiy
2018-07-30  8:58                       ` Alexey Kardashevskiy
2018-07-30  8:58                       ` Alexey Kardashevskiy
2018-07-30 16:29                       ` Alex Williamson
2018-07-30 16:29                         ` Alex Williamson
2018-07-30 16:29                         ` Alex Williamson
2018-07-31  4:03                         ` Alexey Kardashevskiy
2018-07-31  4:03                           ` Alexey Kardashevskiy
2018-07-31  4:03                           ` Alexey Kardashevskiy
2018-07-31 14:29                           ` Alex Williamson
2018-07-31 14:29                             ` Alex Williamson
2018-07-31 14:29                             ` Alex Williamson
2018-08-01  8:37                             ` Alexey Kardashevskiy
2018-08-01  8:37                               ` Alexey Kardashevskiy
2018-08-01  8:37                               ` Alexey Kardashevskiy
2018-08-01 16:16                               ` Alex Williamson
2018-08-01 16:16                                 ` Alex Williamson
2018-08-01 16:16                                 ` Alex Williamson
2018-08-08  8:39                                 ` Alexey Kardashevskiy
2018-08-08  8:39                                   ` Alexey Kardashevskiy
2018-08-08  8:39                                   ` Alexey Kardashevskiy
2018-08-09  4:21                                   ` Alexey Kardashevskiy
2018-08-09  4:21                                     ` Alexey Kardashevskiy
2018-08-09  4:21                                     ` Alexey Kardashevskiy
2018-08-09 14:06                                     ` Alex Williamson
2018-08-09 14:06                                       ` Alex Williamson
2018-08-09 14:06                                       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180607223446.1278deb1@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=linuxram@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.