From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42744)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwankhede@nvidia.com>) id 1ayGme-0005v8-Fw
	for qemu-devel@nongnu.org; Thu, 05 May 2016 06:45:35 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwankhede@nvidia.com>) id 1ayGmS-0003wV-6S
	for qemu-devel@nongnu.org; Thu, 05 May 2016 06:45:22 -0400
Received: from hqemgate16.nvidia.com ([216.228.121.65]:2989)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwankhede@nvidia.com>) id 1ayGmR-0003qU-Tb
	for qemu-devel@nongnu.org; Thu, 05 May 2016 06:45:16 -0400
References: <1462214441-3732-1-git-send-email-kwankhede@nvidia.com>
	<1462214441-3732-2-git-send-email-kwankhede@nvidia.com>
	<20160503164339.240afec4@t450s.home>
	<53ce48bc-44e4-3845-0625-67f3a79800b0@nvidia.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F846EF9@SHSMSX101.ccr.corp.intel.com>
From: Kirti Wankhede <kwankhede@nvidia.com>
Message-ID: <9c1f6f60-bfa7-dd4e-e56c-21cf819b5202@nvidia.com>
Date: Thu, 5 May 2016 16:14:50 +0530
MIME-Version: 1.0
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D15F846EF9@SHSMSX101.ccr.corp.intel.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v3 1/3] vGPU Core driver
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Tian, Kevin" <kevin.tian@intel.com>, Alex Williamson <alex.williamson@redhat.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>, "kraxel@redhat.com" <kraxel@redhat.com>, "cjia@nvidia.com" <cjia@nvidia.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "Ruan, Shuai" <shuai.ruan@intel.com>, "Song, Jike" <jike.song@intel.com>, "Lv, Zhiyuan" <zhiyuan.lv@intel.com>


On 5/5/2016 2:36 PM, Tian, Kevin wrote:
>> From: Kirti Wankhede
>> Sent: Wednesday, May 04, 2016 9:32 PM
>>
>> Thanks Alex.
>>
>>  >> +config VGPU_VFIO
>>  >> +    tristate
>>  >> +    depends on VGPU
>>  >> +    default n
>>  >> +
>>  >
>>  > This is a little bit convoluted, it seems like everything added in this
>>  > patch is vfio agnostic, it doesn't necessarily care what the consumer
>>  > is.  That makes me think we should only be adding CONFIG_VGPU here and
>>  > it should not depend on CONFIG_VFIO or be enabling CONFIG_VGPU_VFIO.
>>  > The middle config entry is also redundant to the first, just move the
>>  > default line up to the first and remove the rest.
>>
>> CONFIG_VGPU doesn't directly depend on VFIO. CONFIG_VGPU_VFIO is
>> directly dependent on VFIO. But devices created by VGPU core module need
>> a driver to manage those devices. CONFIG_VGPU_VFIO is the driver which
>> will manage vgpu devices. So I think CONFIG_VGPU_VFIO should be enabled
>> by CONFIG_VGPU.
>>
>> This would look like:
>> menuconfig VGPU
>>      tristate "VGPU driver framework"
>>      select VGPU_VFIO
>>      default n
>>      help
>>          VGPU provides a framework to virtualize GPU without SR-IOV cap
>>          See Documentation/vgpu.txt for more details.
>>
>>          If you don't know what do here, say N.
>>
>> config VGPU_VFIO
>>      tristate
>>      depends on VGPU
>>      depends on VFIO
>>      default n
>>
>
> There could be multiple drivers operating VGPU. Why do we restrict
> it to VFIO here?
>

VGPU_VFIO uses VFIO APIs, it depends on VFIO.
I think since there is no other driver than VGPU_VFIO for VGPU devices, 
we should keep default selection of VGPU_VFIO on VGPU. May be in future 
if other driver is add ti operate vGPU devices, then default selection 
can be removed.

>>  >> +create_attr_error:
>>  >> +	if (gpu_dev->ops->vgpu_destroy) {
>>  >> +		int ret = 0;
>>  >> +		ret = gpu_dev->ops->vgpu_destroy(gpu_dev->dev,
>>  >> +						 vgpu_dev->uuid,
>>  >> +						 vgpu_dev->vgpu_instance);
>>  >
>>  > Unnecessary initialization and we don't do anything with the result.
>>  > Below indicates lack of vgpu_destroy indicates the vendor doesn't
>>  > support unplug, but doesn't that break our error cleanup path here?
>>  >
>>
>> Comment about vgpu_destroy:
>> If VM is running and vgpu_destroy is called that
>> means the vGPU is being hotunpluged. Return
>> error if VM is running and graphics driver
>> doesn't support vgpu hotplug.
>>
>> Its GPU drivers responsibility to check if VM is running and return
>> accordingly. This is vgpu creation path. Vgpu device would be hotplug to
>> VM on vgpu_start.
>
> How does GPU driver know whether VM is running? VM is managed
> by KVM here.
>
> Maybe it's clearer to say whether vGPU is busy which means some work
> has been loaded to vGPU. That's something GPU driver can tell.
>

GPU driver can detect based on resources allocated for the VM from 
vgpu_create/vgpu_start.

>>
>>  >> + * @vgpu_bar_info:		Called to get BAR size and flags of vGPU device.
>>  >> + *				@vdev: vgpu device structure
>>  >> + *				@bar_index: BAR index
>>  >> + *				@bar_info: output, returns size and flags of
>>  >> + *				requested BAR
>>  >> + *				Returns integer: success (0) or error (< 0)
>>  >
>>  > This is called bar_info, but the bar_index is actually the vfio region
>>  > index and things like the config region info is being overloaded
>>  > through it.  We already have a structure defined for getting a generic
>>  > region index, why not use it?  Maybe this should just be
>>  > vgpu_vfio_get_region_info.
>>  >
>>
>> Ok. Will do.
>
> As you commented earlier that GPU driver is required to provide config
> space (which I agree), then what's the point of introducing another
> bar specific structure? VFIO can use @write to get bar information
> from vgpu config space, just like how it's done on physical device today.
>

It is required not only for size, but also to fetch flags. Region flags 
could be combination of:

#define VFIO_REGION_INFO_FLAG_READ      (1 << 0) /* Region supports read */
#define VFIO_REGION_INFO_FLAG_WRITE     (1 << 1) /* Region supports write */
#define VFIO_REGION_INFO_FLAG_MMAP      (1 << 2) /* Region supports mmap */

Thanks,
Kirti.

>>
>>
>>  >> + * @validate_map_request:	Validate remap pfn request
>>  >> + *				@vdev: vgpu device structure
>>  >> + *				@virtaddr: target user address to start at
>>  >> + *				@pfn: physical address of kernel memory, GPU
>>  >> + *				driver can change if required.
>>  >> + *				@size: size of map area, GPU driver can change
>>  >> + *				the size of map area if desired.
>>  >> + *				@prot: page protection flags for this mapping,
>>  >> + *				GPU driver can change, if required.
>>  >> + *				Returns integer: success (0) or error (< 0)
>>  >
>>  > Was not at all clear to me what this did until I got to patch 2, this
>>  > is actually providing the fault handling for mmap'ing a vGPU mmio BAR.
>>  > Needs a better name or better description.
>>  >
>>
>> If say VMM mmap whole BAR1 of GPU, say 128MB, so fault would occur when
>> BAR1 is tried to access then the size is calculated as:
>> req_size = vma->vm_end - virtaddr
>> Since GPU is being shared by multiple vGPUs, GPU driver might not remap
>> whole BAR1 for only one vGPU device, so would prefer, say map one page
>> at a time. GPU driver returns PAGE_SIZE. This is used by
>> remap_pfn_range(). Now on next access to BAR1 other than that page, we
>> will again get a fault().
>> As the name says this call is to validate from GPU driver for the size
>> and prot of map area. GPU driver can change size and prot for this map area.
>
> Currently we don't require such interface for Intel vGPU. Need to think about
> its rationale carefully (still not clear to me). Jike, do you have any thought on
> this?
>
> Thanks
> Kevin
>