Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: kvm@vger.kernel.org, Ram Pai <linuxram@us.ibm.com>,
	kvm-ppc@vger.kernel.org, Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
Date: Fri, 08 Jun 2018 03:44:55 +0000	[thread overview]
Message-ID: <20180607214455.51ecfa1a@w520.home> (raw)
In-Reply-To: <f0561258-698c-9086-255d-ff6c1aa16cfd@ozlabs.ru>

On Fri, 8 Jun 2018 13:08:54 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 8:15 am, Alex Williamson wrote:
> > On Fri, 08 Jun 2018 07:54:02 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> >> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:  
> >>>
> >>> Can we back up and discuss whether the IOMMU grouping of NVLink
> >>> connected devices makes sense?  AIUI we have a PCI view of these
> >>> devices and from that perspective they're isolated.  That's the view of
> >>> the device used to generate the grouping.  However, not visible to us,
> >>> these devices are interconnected via NVLink.  What isolation properties
> >>> does NVLink provide given that its entire purpose for existing seems to
> >>> be to provide a high performance link for p2p between devices?    
> >>
> >> Not entire. On POWER chips, we also have an nvlink between the device
> >> and the CPU which is running significantly faster than PCIe.
> >>
> >> But yes, there are cross-links and those should probably be accounted
> >> for in the grouping.  
> > 
> > Then after we fix the grouping, can we just let the host driver manage
> > this coherent memory range and expose vGPUs to guests?  The use case of
> > assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
> > convince NVIDIA to support more than a single vGPU per VM though)  
> 
> These are physical GPUs, not virtual sriov-alike things they are
> implementing as well elsewhere.

vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
either.  That's why we have mdev devices now to implement software
defined devices.  I don't have first hand experience with V-series, but
I would absolutely expect a PCIe-based Tesla V100 to support vGPU.

> My current understanding is that every P9 chip in that box has some NVLink2
> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 links
> as well.
> 
> From small bits of information I have it seems that a GPU can perfectly
> work alone and if the NVIDIA driver does not see these interconnects
> (because we do not pass the rest of the big 3xGPU group to this guest), it
> continues with a single GPU. There is an "nvidia-smi -r" big reset hammer
> which simply refuses to work until all 3 GPUs are passed so there is some
> distinction between passing 1 or 3 GPUs, and I am trying (as we speak) to
> get a confirmation from NVIDIA that it is ok to pass just a single GPU.
> 
> So we will either have 6 groups (one per GPU) or 2 groups (one per
> interconnected group).

I'm not gaining much confidence that we can rely on isolation between
NVLink connected GPUs, it sounds like you're simply expecting that
proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
is going to play nice and nobody will figure out how to do bad things
because... obfuscation?  Thanks,

Alex

WARNING: multiple messages have this Message-ID (diff)

From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>,
	kvm-ppc@vger.kernel.org, Ram Pai <linuxram@us.ibm.com>,
	kvm@vger.kernel.org, Alistair Popple <alistair@popple.id.au>
Subject: Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
Date: Thu, 7 Jun 2018 21:44:55 -0600	[thread overview]
Message-ID: <20180607214455.51ecfa1a@w520.home> (raw)
In-Reply-To: <f0561258-698c-9086-255d-ff6c1aa16cfd@ozlabs.ru>

On Fri, 8 Jun 2018 13:08:54 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 8:15 am, Alex Williamson wrote:
> > On Fri, 08 Jun 2018 07:54:02 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> >> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:  
> >>>
> >>> Can we back up and discuss whether the IOMMU grouping of NVLink
> >>> connected devices makes sense?  AIUI we have a PCI view of these
> >>> devices and from that perspective they're isolated.  That's the view of
> >>> the device used to generate the grouping.  However, not visible to us,
> >>> these devices are interconnected via NVLink.  What isolation properties
> >>> does NVLink provide given that its entire purpose for existing seems to
> >>> be to provide a high performance link for p2p between devices?    
> >>
> >> Not entire. On POWER chips, we also have an nvlink between the device
> >> and the CPU which is running significantly faster than PCIe.
> >>
> >> But yes, there are cross-links and those should probably be accounted
> >> for in the grouping.  
> > 
> > Then after we fix the grouping, can we just let the host driver manage
> > this coherent memory range and expose vGPUs to guests?  The use case of
> > assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
> > convince NVIDIA to support more than a single vGPU per VM though)  
> 
> These are physical GPUs, not virtual sriov-alike things they are
> implementing as well elsewhere.

vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
either.  That's why we have mdev devices now to implement software
defined devices.  I don't have first hand experience with V-series, but
I would absolutely expect a PCIe-based Tesla V100 to support vGPU.

> My current understanding is that every P9 chip in that box has some NVLink2
> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 links
> as well.
> 
> From small bits of information I have it seems that a GPU can perfectly
> work alone and if the NVIDIA driver does not see these interconnects
> (because we do not pass the rest of the big 3xGPU group to this guest), it
> continues with a single GPU. There is an "nvidia-smi -r" big reset hammer
> which simply refuses to work until all 3 GPUs are passed so there is some
> distinction between passing 1 or 3 GPUs, and I am trying (as we speak) to
> get a confirmation from NVIDIA that it is ok to pass just a single GPU.
> 
> So we will either have 6 groups (one per GPU) or 2 groups (one per
> interconnected group).

I'm not gaining much confidence that we can rely on isolation between
NVLink connected GPUs, it sounds like you're simply expecting that
proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
is going to play nice and nobody will figure out how to do bad things
because... obfuscation?  Thanks,

Alex

WARNING: multiple messages have this Message-ID (diff)

From: Alex Williamson <alex.williamson@redhat.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: kvm@vger.kernel.org, Ram Pai <linuxram@us.ibm.com>,
	kvm-ppc@vger.kernel.org, Alistair Popple <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100
Date: Thu, 7 Jun 2018 21:44:55 -0600	[thread overview]
Message-ID: <20180607214455.51ecfa1a@w520.home> (raw)
In-Reply-To: <f0561258-698c-9086-255d-ff6c1aa16cfd@ozlabs.ru>

On Fri, 8 Jun 2018 13:08:54 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 8/6/18 8:15 am, Alex Williamson wrote:
> > On Fri, 08 Jun 2018 07:54:02 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >   
> >> On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote:  
> >>>
> >>> Can we back up and discuss whether the IOMMU grouping of NVLink
> >>> connected devices makes sense?  AIUI we have a PCI view of these
> >>> devices and from that perspective they're isolated.  That's the view of
> >>> the device used to generate the grouping.  However, not visible to us,
> >>> these devices are interconnected via NVLink.  What isolation properties
> >>> does NVLink provide given that its entire purpose for existing seems to
> >>> be to provide a high performance link for p2p between devices?    
> >>
> >> Not entire. On POWER chips, we also have an nvlink between the device
> >> and the CPU which is running significantly faster than PCIe.
> >>
> >> But yes, there are cross-links and those should probably be accounted
> >> for in the grouping.  
> > 
> > Then after we fix the grouping, can we just let the host driver manage
> > this coherent memory range and expose vGPUs to guests?  The use case of
> > assigning all 6 GPUs to one VM seems pretty limited.  (Might need to
> > convince NVIDIA to support more than a single vGPU per VM though)  
> 
> These are physical GPUs, not virtual sriov-alike things they are
> implementing as well elsewhere.

vGPUs as implemented on M- and P-series Teslas aren't SR-IOV like
either.  That's why we have mdev devices now to implement software
defined devices.  I don't have first hand experience with V-series, but
I would absolutely expect a PCIe-based Tesla V100 to support vGPU.

> My current understanding is that every P9 chip in that box has some NVLink2
> logic on it so each P9 is directly connected to 3 GPUs via PCIe and
> 2xNVLink2, and GPUs in that big group are interconnected by NVLink2 links
> as well.
> 
> From small bits of information I have it seems that a GPU can perfectly
> work alone and if the NVIDIA driver does not see these interconnects
> (because we do not pass the rest of the big 3xGPU group to this guest), it
> continues with a single GPU. There is an "nvidia-smi -r" big reset hammer
> which simply refuses to work until all 3 GPUs are passed so there is some
> distinction between passing 1 or 3 GPUs, and I am trying (as we speak) to
> get a confirmation from NVIDIA that it is ok to pass just a single GPU.
> 
> So we will either have 6 groups (one per GPU) or 2 groups (one per
> interconnected group).

I'm not gaining much confidence that we can rely on isolation between
NVLink connected GPUs, it sounds like you're simply expecting that
proprietary code from NVIDIA on a proprietary interconnect from NVIDIA
is going to play nice and nobody will figure out how to do bad things
because... obfuscation?  Thanks,

Alex

next prev parent reply	other threads:[~2018-06-08  3:44 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07  8:44 [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alexey Kardashevskiy
2018-06-07  8:44 ` Alexey Kardashevskiy
2018-06-07  8:44 ` Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 1/5] vfio/spapr_tce: Simplify page contained test Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-08  3:32   ` David Gibson
2018-06-08  3:32     ` David Gibson
2018-06-08  3:32     ` David Gibson
2018-06-07  8:44 ` [RFC PATCH kernel 2/5] powerpc/iommu_context: Change referencing in API Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 3/5] powerpc/iommu: Do not pin memory of a memory device Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44 ` [RFC PATCH kernel 4/5] vfio_pci: Allow mapping extra regions Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07 17:04   ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-07  8:44 ` [RFC PATCH kernel 5/5] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07  8:44   ` Alexey Kardashevskiy
2018-06-07 17:04   ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-07 17:04     ` Alex Williamson
2018-06-08  3:09     ` Alexey Kardashevskiy
2018-06-08  3:09       ` Alexey Kardashevskiy
2018-06-08  3:09       ` Alexey Kardashevskiy
2018-06-08  3:35       ` Alex Williamson
2018-06-08  3:35         ` Alex Williamson
2018-06-08  3:35         ` Alex Williamson
2018-06-08  3:52         ` Alexey Kardashevskiy
2018-06-08  3:52           ` Alexey Kardashevskiy
2018-06-08  3:52           ` Alexey Kardashevskiy
2018-06-08  4:34           ` Alex Williamson
2018-06-08  4:34             ` Alex Williamson
2018-06-08  4:34             ` Alex Williamson
2018-06-07 17:04 ` [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100 Alex Williamson
2018-06-07 17:04   ` Alex Williamson
2018-06-07 17:04   ` Alex Williamson
2018-06-07 21:54   ` Benjamin Herrenschmidt
2018-06-07 21:54     ` Benjamin Herrenschmidt
2018-06-07 21:54     ` Benjamin Herrenschmidt
2018-06-07 22:15     ` Alex Williamson
2018-06-07 22:15       ` Alex Williamson
2018-06-07 22:15       ` Alex Williamson
2018-06-07 23:20       ` Benjamin Herrenschmidt
2018-06-07 23:20         ` Benjamin Herrenschmidt
2018-06-07 23:20         ` Benjamin Herrenschmidt
2018-06-08  0:34         ` Alex Williamson
2018-06-08  0:34           ` Alex Williamson
2018-06-08  0:34           ` Alex Williamson
2018-06-08  0:58           ` Benjamin Herrenschmidt
2018-06-08  0:58             ` Benjamin Herrenschmidt
2018-06-08  0:58             ` Benjamin Herrenschmidt
2018-06-08  1:18             ` Alex Williamson
2018-06-08  1:18               ` Alex Williamson
2018-06-08  1:18               ` Alex Williamson
2018-06-08  3:08       ` Alexey Kardashevskiy
2018-06-08  3:08         ` Alexey Kardashevskiy
2018-06-08  3:08         ` Alexey Kardashevskiy
2018-06-08  3:44         ` Alex Williamson [this message]
2018-06-08  3:44           ` Alex Williamson
2018-06-08  3:44           ` Alex Williamson
2018-06-08  4:14           ` Alexey Kardashevskiy
2018-06-08  4:14             ` Alexey Kardashevskiy
2018-06-08  4:14             ` Alexey Kardashevskiy
2018-06-08  5:03             ` Alex Williamson
2018-06-08  5:03               ` Alex Williamson
2018-06-08  5:03               ` Alex Williamson
2018-07-10  4:10               ` Alexey Kardashevskiy
2018-07-10  4:10                 ` Alexey Kardashevskiy
2018-07-10  4:10                 ` Alexey Kardashevskiy
2018-07-10 22:37                 ` Alex Williamson
2018-07-10 22:37                   ` Alex Williamson
2018-07-10 22:37                   ` Alex Williamson
2018-07-11  9:26                   ` Alexey Kardashevskiy
2018-07-11  9:26                     ` Alexey Kardashevskiy
2018-07-11  9:26                     ` Alexey Kardashevskiy
2018-07-30  8:58                     ` Alexey Kardashevskiy
2018-07-30  8:58                       ` Alexey Kardashevskiy
2018-07-30  8:58                       ` Alexey Kardashevskiy
2018-07-30 16:29                       ` Alex Williamson
2018-07-30 16:29                         ` Alex Williamson
2018-07-30 16:29                         ` Alex Williamson
2018-07-31  4:03                         ` Alexey Kardashevskiy
2018-07-31  4:03                           ` Alexey Kardashevskiy
2018-07-31  4:03                           ` Alexey Kardashevskiy
2018-07-31 14:29                           ` Alex Williamson
2018-07-31 14:29                             ` Alex Williamson
2018-07-31 14:29                             ` Alex Williamson
2018-08-01  8:37                             ` Alexey Kardashevskiy
2018-08-01  8:37                               ` Alexey Kardashevskiy
2018-08-01  8:37                               ` Alexey Kardashevskiy
2018-08-01 16:16                               ` Alex Williamson
2018-08-01 16:16                                 ` Alex Williamson
2018-08-01 16:16                                 ` Alex Williamson
2018-08-08  8:39                                 ` Alexey Kardashevskiy
2018-08-08  8:39                                   ` Alexey Kardashevskiy
2018-08-08  8:39                                   ` Alexey Kardashevskiy
2018-08-09  4:21                                   ` Alexey Kardashevskiy
2018-08-09  4:21                                     ` Alexey Kardashevskiy
2018-08-09  4:21                                     ` Alexey Kardashevskiy
2018-08-09 14:06                                     ` Alex Williamson
2018-08-09 14:06                                       ` Alex Williamson
2018-08-09 14:06                                       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180607214455.51ecfa1a@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=linuxram@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.