From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH] KVM: PCIPT: VT-d: fix guest unmap Date: Wed, 18 Jun 2008 15:48:33 -0500 Message-ID: <485974A1.60007@codemonkey.ws> References: <1213729526-10410-1-git-send-email-benami@il.ibm.com> <1213729526-10410-2-git-send-email-benami@il.ibm.com> <1213729526-10410-3-git-send-email-benami@il.ibm.com> <48582CD0.5060109@codemonkey.ws> <1213790811.9177.19.camel@lnx-benami> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: amit.shah@qumranet.com, weidong.han@intel.com, Muli Ben-Yehuda , raharper@us.ibm.com, kvm@vger.kernel.org To: Ben-Ami Yassour Return-path: Received: from wr-out-0506.google.com ([64.233.184.238]:60799 "EHLO wr-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750853AbYFRUs6 (ORCPT ); Wed, 18 Jun 2008 16:48:58 -0400 Received: by wr-out-0506.google.com with SMTP id 69so332267wri.5 for ; Wed, 18 Jun 2008 13:48:53 -0700 (PDT) In-Reply-To: <1213790811.9177.19.camel@lnx-benami> Sender: kvm-owner@vger.kernel.org List-ID: Ben-Ami Yassour wrote: > On Tue, 2008-06-17 at 16:29 -0500, Anthony Liguori wrote: > >> I think the current VT-d code needs some reworking. >> >> We should build the table as the shadow page table gets built. We >> should suppress iotlb flushes unless the table is actually being updated. >> >> > > I'm not sure what you mean. > The current implementation of vtd for passthrough is a direct map, which > means that we map the entire guest memory (and pin it). > In this case there are no iotlb flushes after the first initialization. > Right. But this is not ideal. Instead of pinning up-front, it would make more sense IMHO to build the VT-d table as the shadow page table gets faulted in. In certain circumstances, this will result in extraneous updates (because a GPA=>HPA mapping is already present) and that's where we should eliminate iotlb flushes. For now, we should basically do this for all of physical memory but we should have the right infrastructure such that we can be more clever once we have a PVDMA API. > Obviously, pinning the entire guest is not desirable since we waste a > lot of memory resources, but this is the approach that we currently > have. Do you find it good enough for a merge with the main KVM tree, and > optimize later? > No, it's not safe. What happens mmap(MAP_FIXED) into phys_ram_base? We need to use MMU notifiers to handle such events and appropriately flush the iotlb. > When you mentioned building a table as the shadow page table, did you > mean that we should map the IOMMU on demand? > Yes, but in the absence of a PV guest, there's a very special case where we pre-fault the entire table. > I'm not sure how we can do that... the guest can send a guest physical > address to the device for DMA, even without generating a page-fault on > the host for that address... which implies that the host must pin the > entire guest memory in advance. agree? > See above. Ideally we would wait until the first PCI config space access for a device before special casing the guest. Otherwise, there's no way to allow a DMA-aware guest to avoid pinning up front. > The only way I can think of avoiding that is PVDMA with VT-d, which > means that there is a hyper call for each DMA request, but this is a > different solution, cause it only applies to PV guests. > It doesn't strictly require a hypercall, but yes, that's the general solution. > Do you see a way to avoid mapping (and pinning) the entire guest memory > for fully virtual guests (and without parsing every transaction between > the guest and the device to figure out the DMA addresses)? > The key is to support both cases with the same infrastructure. The unmodified guest should just be a special case. Regards, Anthony Liguori > Regards, > Ben > > >> Regards, >> >> Anthony Liguori >> >> > > >