From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
Date: Wed, 18 Jun 2008 15:48:33 -0500
Message-ID: <485974A1.60007@codemonkey.ws>
References: <1213729526-10410-1-git-send-email-benami@il.ibm.com>	 <1213729526-10410-2-git-send-email-benami@il.ibm.com>	 <1213729526-10410-3-git-send-email-benami@il.ibm.com>	 <48582CD0.5060109@codemonkey.ws> <1213790811.9177.19.camel@lnx-benami>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: amit.shah@qumranet.com, weidong.han@intel.com,
	Muli Ben-Yehuda <MULI@il.ibm.com>, raharper@us.ibm.com,
	kvm@vger.kernel.org
To: Ben-Ami Yassour <benami@il.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from wr-out-0506.google.com ([64.233.184.238]:60799 "EHLO
	wr-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750853AbYFRUs6 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 18 Jun 2008 16:48:58 -0400
Received: by wr-out-0506.google.com with SMTP id 69so332267wri.5
        for <kvm@vger.kernel.org>; Wed, 18 Jun 2008 13:48:53 -0700 (PDT)
In-Reply-To: <1213790811.9177.19.camel@lnx-benami>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Ben-Ami Yassour wrote:
> On Tue, 2008-06-17 at 16:29 -0500, Anthony Liguori wrote:
>   
>> I think the current VT-d code needs some reworking.
>>
>> We should build the table as the shadow page table gets built.  We 
>> should suppress iotlb flushes unless the table is actually being updated.
>>
>>     
>
> I'm not sure what you mean.
> The current implementation of vtd for passthrough is a direct map, which
> means that we map the entire guest memory (and pin it).
> In this case there are no iotlb flushes after the first initialization.
>   

Right.  But this is not ideal.  Instead of pinning up-front, it would 
make more sense IMHO to build the VT-d table as the shadow page table 
gets faulted in.  In certain circumstances, this will result in 
extraneous updates (because a GPA=>HPA mapping is already present) and 
that's where we should eliminate iotlb flushes.

For now, we should basically do this for all of physical memory but we 
should have the right infrastructure such that we can be more clever 
once we have a PVDMA API.

> Obviously, pinning the entire guest is not desirable since we waste a
> lot of memory resources, but this is the approach that we currently
> have. Do you find it good enough for a merge with the main KVM tree, and
> optimize later?
>   

No, it's not safe.  What happens mmap(MAP_FIXED) into phys_ram_base?  We 
need to use MMU notifiers to handle such events and appropriately flush 
the iotlb. 

> When you mentioned building a table as the shadow page table, did you
> mean that we should map the IOMMU on demand?
>   

Yes, but in the absence of a PV guest, there's a very special case where 
we pre-fault the entire table.

> I'm not sure how we can do that... the guest can send a guest physical
> address to the device for DMA, even without generating a page-fault on
> the host for that address... which implies that the host must pin the
> entire guest memory in advance. agree?
>   

See above.  Ideally we would wait until the first PCI config space 
access for a device before special casing the guest.  Otherwise, there's 
no way to allow a DMA-aware guest to avoid pinning up front.

> The only way I can think of avoiding that is PVDMA with VT-d, which
> means that there is a hyper call for each DMA request, but this is a
> different solution, cause it only applies to PV guests.
>   

It doesn't strictly require a hypercall, but yes, that's the general 
solution.

> Do you see a way to avoid mapping (and pinning) the entire guest memory
> for fully virtual guests (and without parsing every transaction between
> the guest and the device to figure out the DMA addresses)?
>   

The key is to support both cases with the same infrastructure.  The 
unmodified guest should just be a special case.

Regards,

Anthony Liguori

> Regards,
> Ben
>
>   
>> Regards,
>>
>> Anthony Liguori
>>
>>     
>
>
>