From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap
Date: Wed, 18 Jun 2008 16:41:23 -0500
Message-ID: <48598103.8060504@codemonkey.ws>
References: <1213729526-10410-1-git-send-email-benami@il.ibm.com> <1213729526-10410-2-git-send-email-benami@il.ibm.com> <1213729526-10410-3-git-send-email-benami@il.ibm.com> <48582CD0.5060109@codemonkey.ws> <1213790811.9177.19.camel@lnx-benami> <485974A1.60007@codemonkey.ws> <20080618212346.GL7186@il.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Ben-Ami Yassour1 <BENAMI@il.ibm.com>, amit.shah@qumranet.com,
	weidong.han@intel.com, raharper@us.ibm.com, kvm@vger.kernel.org
To: Muli Ben-Yehuda <muli@il.ibm.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from yw-out-2324.google.com ([74.125.46.31]:54571 "EHLO
	yw-out-2324.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755602AbYFRVmE (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 18 Jun 2008 17:42:04 -0400
Received: by yw-out-2324.google.com with SMTP id 9so258517ywe.1
        for <kvm@vger.kernel.org>; Wed, 18 Jun 2008 14:42:03 -0700 (PDT)
In-Reply-To: <20080618212346.GL7186@il.ibm.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Muli Ben-Yehuda wrote:
> On Wed, Jun 18, 2008 at 03:48:33PM -0500, Anthony Liguori wrote:
>
>   
>> Right.  But this is not ideal.  Instead of pinning up-front, it
>> would make more sense IMHO to build the VT-d table as the shadow
>> page table gets faulted in.  In certain circumstances, this will
>> result in extraneous updates (because a GPA=>HPA mapping is already
>> present) and that's where we should eliminate iotlb flushes.
>>     
>
> As Ben wrote, we can't do this and must fault everything in up-front
> (assuming no PVDMA API). Assume we don't do this: it is valid for the
> guest to program the device with a GPA that does not yet have a
> corresponding HPA (because the guest did not write or read to/from it
> and thus we haven't yet faulted in a frame for it). Then, once the
> device DMA's to it, the DMA will be stopped incorrectly.
>   

As I've said, the lack of PVDMA API is a special case.  The key is to 
use the same internal infrastructure.

>>> Obviously, pinning the entire guest is not desirable since we waste
>>> a lot of memory resources, but this is the approach that we
>>> currently have. Do you find it good enough for a merge with the
>>> main KVM tree, and optimize later?
>>>       
>> No, it's not safe.  What happens mmap(MAP_FIXED) into phys_ram_base?
>> We need to use MMU notifiers to handle such events and appropriately
>> flush the iotlb.
>>     
>
> Could you elaborate on what you mean here and what is not safe? Our
> current approach is to just fault in all of guest memory---are you
> concerned about a case where some of the guest frames get replaced by
> other frames because of the mmap()? 
>   

Because the guest is now accessing memory that is not guest memory.  
When mmu-notifiers forcefully change a mapping, we need to react to it.

> I'd like to stress that we are shooting at the moment for the simplest
> possible solution that is good enough, so that we'll be able to
> finally merge this into the tree...
>   

I don't think what I'm suggesting is more code than the current 
implementation and it fits more cleanly into KVM.

>>> I'm not sure how we can do that... the guest can send a guest
>>> physical address to the device for DMA, even without generating a
>>> page-fault on the host for that address... which implies that the
>>> host must pin the entire guest memory in advance. agree?
>>>       
>> See above.  Ideally we would wait until the first PCI config space
>> access for a device before special casing the guest.  Otherwise,
>> there's no way to allow a DMA-aware guest to avoid pinning up front.
>>     
>
> Err, if the user gave the guest pass-through access to a PCI device,
> presumably it is because the guest will use it... What do we win by
> delaying the inevitable?
>   

s/DMA-aware/PVDMA-aware/

You do not know if a guest is PVDMA-aware until the guest tells you 
so.   If you pin all of memory before the guest starts running, you may 
not have needed to allocate all of that memory.  As we move to 
cooperative memory management between the host and guest, I expect the 
normal circumstance will be to launch a guest with far more memory than 
it needs relying on the fact that the guest will not touch that memory.  
Pinning memory unconditionally defeats this.

In terms of merging, I don't think it's going to be reasonable to merge 
for 2.6.27 so there's not much of an argument for not doing it correctly.

Regards,

Anthony Liguori

> Cheers,
> Muli
>