From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40245) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1azk3b-0003n5-P7 for qemu-devel@nongnu.org; Mon, 09 May 2016 08:13:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1azk3V-0005Zk-Kj for qemu-devel@nongnu.org; Mon, 09 May 2016 08:13:02 -0400 Received: from mga01.intel.com ([192.55.52.88]:43248) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1azk3V-0005YU-9G for qemu-devel@nongnu.org; Mon, 09 May 2016 08:12:57 -0400 Message-ID: <57307E9F.30203@intel.com> Date: Mon, 09 May 2016 20:12:15 +0800 From: Jike Song MIME-Version: 1.0 References: <1462214441-3732-1-git-send-email-kwankhede@nvidia.com> <1462214441-3732-2-git-send-email-kwankhede@nvidia.com> <20160503164339.240afec4@t450s.home> <53ce48bc-44e4-3845-0625-67f3a79800b0@nvidia.com> <572C8AC0.4070602@intel.com> <59e8f3d0-da40-4ba1-15c5-9fbfd075232f@nvidia.com> In-Reply-To: <59e8f3d0-da40-4ba1-15c5-9fbfd075232f@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH v3 1/3] vGPU Core driver List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kirti Wankhede Cc: "Tian, Kevin" , Alex Williamson , "pbonzini@redhat.com" , "kraxel@redhat.com" , "cjia@nvidia.com" , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , "Ruan, Shuai" , "Lv, Zhiyuan" On 05/07/2016 12:16 AM, Kirti Wankhede wrote: > > > On 5/6/2016 5:44 PM, Jike Song wrote: >> On 05/05/2016 05:06 PM, Tian, Kevin wrote: >>>> From: Kirti Wankhede >>>> >>>> >> + * @validate_map_request: Validate remap pfn request >>>> >> + * @vdev: vgpu device structure >>>> >> + * @virtaddr: target user address to start at >>>> >> + * @pfn: physical address of kernel memory, GPU >>>> >> + * driver can change if required. >>>> >> + * @size: size of map area, GPU driver can change >>>> >> + * the size of map area if desired. >>>> >> + * @prot: page protection flags for this mapping, >>>> >> + * GPU driver can change, if required. >>>> >> + * Returns integer: success (0) or error (< 0) >>>> > >>>> > Was not at all clear to me what this did until I got to patch 2, this >>>> > is actually providing the fault handling for mmap'ing a vGPU mmio BAR. >>>> > Needs a better name or better description. >>>> > >>>> >>>> If say VMM mmap whole BAR1 of GPU, say 128MB, so fault would occur when >>>> BAR1 is tried to access then the size is calculated as: >>>> req_size = vma->vm_end - virtaddr >> Hi Kirti, >> >> virtaddr is the faulted one, vma->vm_end the vaddr of the mmap-ed 128MB BAR1? >> >> Would you elaborate why (vm_end - fault_addr) results the requested size? >> >> > > If first access is at start address of mmaped address, fault_addr is > vma->vm_start. Then (vm_end - vm_start) is the size mmapped region. > > req_size should not exceed (vm_end - vm_start). > [Thanks for the kind explanation, I spent some time to dig & recall the details] So this consists of two checks: 1) vm_end >= vm_start 2) fault_addr >= vm_start && fault_addr <= vm_end >>>> Since GPU is being shared by multiple vGPUs, GPU driver might not remap >>>> whole BAR1 for only one vGPU device, so would prefer, say map one page >>>> at a time. GPU driver returns PAGE_SIZE. This is used by >>>> remap_pfn_range(). Now on next access to BAR1 other than that page, we >>>> will again get a fault(). >>>> As the name says this call is to validate from GPU driver for the size >>>> and prot of map area. GPU driver can change size and prot for this map area. >> >> If I understand correctly, you are trying to share a physical BAR among >> multiple vGPUs, by mapping a single pfn each time, when fault happens? >> > > Yes. > Thanks. For the vma with a vm_ops, and each time only one pfn to proceed, can we replace remap_pfn_range with vm_insert_pfn? I had a quick check on kernel repo, it seems that remap_pfn_range is only called from fops.mmap, not from vma->vm_ops.fault. >>> >>> Currently we don't require such interface for Intel vGPU. Need to think about >>> its rationale carefully (still not clear to me). Jike, do you have any thought on >>> this? >> >> We need the mmap method of vgpu_device to be implemented, but I was >> expecting something else, like calling remap_pfn_range() directly from >> the mmap. >> > > Calling remap_pfn_range directly from mmap means you would like to remap > pfn for whole BAR1 during mmap, right? > > In that case, don't set validate_map_request() and access start of mmap > address, so that on first access it will do remap_pfn_range() for > (vm_end - vm_start). No. I'd like QEMU to be aware that only a *portion* of the physical BAR1 is available for the vGPU, like: pGPU : 1GB size BAR1 vGPU : 128MB size BAR1 QEMU has the information of the available size for a particular vGPU, calling mmap() with that. I'd say that your implementation is nice and flexible, but in order to ensure whatever level a resource QoS, you have to account it from the device-model (where validate_map_request is implemented), right? How about making QEMU be aware that only a portion of MMIO is available? Would appreciate hearing your opinion on this. Thanks! > Thanks, > Kirti > -- Thanks, Jike