From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754625AbcGEGag (ORCPT ); Tue, 5 Jul 2016 02:30:36 -0400 Received: from mga09.intel.com ([134.134.136.24]:14647 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751219AbcGEGac (ORCPT ); Tue, 5 Jul 2016 02:30:32 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,578,1459839600"; d="scan'208";a="1001054266" Subject: Re: [PATCH 0/2] KVM: MMU: support VMAs that got remap_pfn_range-ed To: Neo Jia References: <20160704075302.GA13470@nvidia.com> <577A1C08.1020509@linux.intel.com> <20160704084127.GA14638@nvidia.com> <577A2211.2030906@linux.intel.com> <20160704091609.GA14913@nvidia.com> <577A378E.3030103@linux.intel.com> <20160704153303.GA18357@nvidia.com> <577B0B2C.8010300@linux.intel.com> <20160705013555.GA24282@nvidia.com> <577B3162.3050308@linux.intel.com> <20160705051627.GA26394@nvidia.com> Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Kirti Wankhede , Andrea Arcangeli , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= From: Xiao Guangrong Message-ID: <577B5326.7050209@linux.intel.com> Date: Tue, 5 Jul 2016 14:26:46 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160705051627.GA26394@nvidia.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/05/2016 01:16 PM, Neo Jia wrote: > On Tue, Jul 05, 2016 at 12:02:42PM +0800, Xiao Guangrong wrote: >> >> >> On 07/05/2016 09:35 AM, Neo Jia wrote: >>> On Tue, Jul 05, 2016 at 09:19:40AM +0800, Xiao Guangrong wrote: >>>> >>>> >>>> On 07/04/2016 11:33 PM, Neo Jia wrote: >>>> >>>>>>> >>>>>>> Sorry, I think I misread the "allocation" as "mapping". We only delay the >>>>>>> cpu mapping, not the allocation. >>>>>> >>>>>> So how to understand your statement: >>>>>> "at that moment nobody has any knowledge about how the physical mmio gets virtualized" >>>>>> >>>>>> The resource, physical MMIO region, has been allocated, why we do not know the physical >>>>>> address mapped to the VM? >>>>>> >>>>> >>>>> >From a device driver point of view, the physical mmio region never gets allocated until >>>>> the corresponding resource is requested by clients and granted by the mediated device driver. >>>> >>>> Hmm... but you told me that you did not delay the allocation. :( >>> >>> Hi Guangrong, >>> >>> The allocation here is the allocation of device resource, and the only way to >>> access that kind of device resource is via a mmio region of some pages there. >>> >>> For example, if VM needs resource A, and the only way to access resource A is >>> via some kind of device memory at mmio address X. >>> >>> So, we never defer the allocation request during runtime, we just setup the >>> CPU mapping later when it actually gets accessed. >>> >>>> >>>> So it returns to my original question: why not allocate the physical mmio region in mmap()? >>>> >>> >>> Without running anything inside the VM, how do you know how the hw resource gets >>> allocated, therefore no knowledge of the use of mmio region. >> >> The allocation and mapping can be two independent processes: >> - the first process is just allocation. The MMIO region is allocated from physical >> hardware and this region is mapped into _QEMU's_ arbitrary virtual address by mmap(). >> At this time, VM can not actually use this resource. >> >> - the second process is mapping. When VM enable this region, e.g, it enables the >> PCI BAR, then QEMU maps its virtual address returned by mmap() to VM's physical >> memory. After that, VM can access this region. >> >> The second process is completed handled in userspace, that means, the mediated >> device driver needn't care how the resource is mapped into VM. > > In your example, you are still picturing it as VFIO direct assign, but the solution we are > talking here is mediated passthru via VFIO framework to virtualize DMA devices without SR-IOV. > Please see my comments below. > (Just for completeness, if you really want to use a device in above example as > VFIO passthru, the second step is not completely handled in userspace, it is actually the guest > driver who will allocate and setup the proper hw resource which will later ready > for you to access via some mmio pages.) Hmm... i always treat the VM as userspace. > >> >> This is how QEMU/VFIO currently works, could you please tell me the special points >> of your solution comparing with current QEMU/VFIO and why current model can not fit >> your requirement? So that we can better understand your scenario? > > The scenario I am describing here is mediated passthru case, but what you are > describing here (more or less) is VFIO direct assigned case. It is different in several > areas, but major difference related to this topic here is: > > 1) In VFIO direct assigned case, the device (and its resource) is completely owned by the VM > therefore its mmio region can be mapped directly into the VM during the VFIO mmap() call as > there is no resource sharing among VMs and there is no mediated device driver on > the host to manage such resource, so it is completely owned by the guest. I understand this difference, However, as you told to me that the MMIO region allocated for the VM is continuous, so i assume the portion of physical MMIO region is completely owned by guest. The only difference i can see is mediated device driver need to allocate that region. > > 2) In mediated passthru case, multiple VMs are sharing the same physical device, so how > the HW resource gets allocated is completely decided by the guest and host device driver of > the virtualized DMA device, here is the GPU, same as the MMIO pages used to access those Hw resource. I can not see what guest's affair is here, look at your code, you cooked the fault handler like this: + ret = parent->ops->validate_map_request(mdev, virtaddr, + &pgoff, &req_size, + &pg_prot); Please tell me what information is got from guest? All these info can be found at the time of mmap().