From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44029)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1b28GI-0000yN-J2
	for qemu-devel@nongnu.org; Sun, 15 May 2016 22:28:03 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1b28GE-0000Xj-Fz
	for qemu-devel@nongnu.org; Sun, 15 May 2016 22:28:02 -0400
Received: from mga03.intel.com ([134.134.136.65]:51311)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1b28GE-0000Xc-7k
	for qemu-devel@nongnu.org; Sun, 15 May 2016 22:27:58 -0400
Message-ID: <57392FF4.3060501@intel.com>
Date: Mon, 16 May 2016 10:27:00 +0800
From: Jike Song <jike.song@intel.com>
MIME-Version: 1.0
References: <572AEE72.90008@intel.com>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F8470F6@SHSMSX101.ccr.corp.intel.com>
	<5731933B.90508@intel.com> <20160510160257.GA4125@nvidia.com>
	<5732F823.3090409@intel.com> <20160511160628.690876f9@t450s.home>
	<AADFC41AFE54684AB9EE6CBC0274A5D15F8519E0@SHSMSX101.ccr.corp.intel.com>
	<20160512130552.08974076@t450s.home>
	<20160512201258.GB24334@nvidia.com> <5735A269.5080909@intel.com>
	<20160513154853.GA11236@nvidia.com>
In-Reply-To: <20160513154853.GA11236@nvidia.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to
 support with iommu and without iommu
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Neo Jia <cjia@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>, "Tian, Kevin" <kevin.tian@intel.com>, Kirti Wankhede <kwankhede@nvidia.com>, "pbonzini@redhat.com" <pbonzini@redhat.com>, "kraxel@redhat.com" <kraxel@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "Ruan, Shuai" <shuai.ruan@intel.com>, "Lv, Zhiyuan" <zhiyuan.lv@intel.com>

On 05/13/2016 11:48 PM, Neo Jia wrote:
> On Fri, May 13, 2016 at 05:46:17PM +0800, Jike Song wrote:
>> On 05/13/2016 04:12 AM, Neo Jia wrote:
>>> On Thu, May 12, 2016 at 01:05:52PM -0600, Alex Williamson wrote:
>>>>
>>>> If you're trying to equate the scale of what we need to track vs what
>>>> type1 currently tracks, they're significantly different.  Possible
>>>> things we need to track include the pfn, the iova, and possibly a
>>>> reference count or some sort of pinned page map.  In the pin-all model
>>>> we can assume that every page is pinned on map and unpinned on unmap,
>>>> so a reference count or map is unnecessary.  We can also assume that we
>>>> can always regenerate the pfn with get_user_pages() from the vaddr, so
>>>> we don't need to track that.  
>>>
>>> Hi Alex,
>>>
>>> Thanks for pointing this out, we will not track those in our next rev and
>>> get_user_pages will be used from the vaddr as you suggested to handle the
>>> single VM with both passthru + mediated device case.
>>>
>>
>> Just a gut feeling:
>>
>> Calling GUP every time for a particular vaddr, means locking mm->mmap_sem
>> every time for a particular process. If the VM has dozens of VCPU, which
>> is not rare, the semaphore is likely to be the bottleneck.
> 
> Hi Jike,
> 
> We do need to hold the lock of mm->mmap_sem for the VMM/QEMU process, but I
> don't quite follow the reasoning with "dozens of vcpus", one situation that I
> can think of is that we have other thread competing with the mmap_sem for the
> VMM/QEMU process within KVM kernel such as hva_to_pfn, after a quick search it
> seems only mostly gets used by iotcl "KVM_ASSIGN_PCI_DEVICE".
>

I meant, on guest's writing a gfn to GPU MMU, which could happen on any vcpu,
so vmexit happens and mmap_sem required.  But I'm now realized that it's
also the situation even we store the pfn in rbtree ..

> We will definitely conduct performance analysis with large configuration on
> servers with E5-2697 v4. :-)

My homage :)

--
Thanks,
Jike