From mboxrd@z Thu Jan 1 00:00:00 1970 From: Weidong Han Subject: Re: [PATCH] VT-d: fix PCI device detach from virtual machine Date: Thu, 17 Jun 2010 17:15:58 +0800 Message-ID: <4C19E7CE.1090309@intel.com> References: <715D42877B251141A38726ABF5CABF2C0195A7B959@pdsmsx503.ccr.corp.intel.com> <1276557557.2063.43.camel@macbook.infradead.org> <4C199812.4080703@intel.com> <1276764596.5664.17.camel@macbook.infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: 'Avi Kivity' , "'iommu@lists.linux-foundation.org'" , 'kvm' , "Kay, Allen M" To: David Woodhouse Return-path: Received: from mga03.intel.com ([143.182.124.21]:44011 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750975Ab0FQJQC (ORCPT ); Thu, 17 Jun 2010 05:16:02 -0400 In-Reply-To: <1276764596.5664.17.camel@macbook.infradead.org> Sender: kvm-owner@vger.kernel.org List-ID: David Woodhouse wrote: > On Thu, 2010-06-17 at 11:35 +0800, Weidong Han wrote: > =20 >> David Woodhouse wrote: >> =20 >>> So why do we bother setting up a context in the IOMMU for the devic= e >>> itself, when no DMA will ever appear to come from this device? And >>> =20 >>> =20 >> if the device is behind PCI Express-to-PCI/PCI-X bridge, the source-= id=20 >> may be the device bdf or the source-id provided by the bridge. so it >> needs to map the device itself. >> =20 > > Ah, that makes some sense, and matches what J=C3=B6rg said about the = AMD > IOMMU spec. Thanks. > > =20 >>> likewise why do we bother setting up a context for intermediate PCI >>> bridges? >>> =20 >>> =20 >> I'm not sure if the intermediate PCI bridges are necessary. need to=20 >> check PCI spec. >> =20 > > FWIW, the AMD IOMMU doesn't do this; it only sets up the mapping for = the > original device and for its 'proxy'. > > =20 >>> Why not just jump straight to the 'DMA proxy' device, and use that >>> _only_? >>> =20 >>> =20 >> What's the 'DMA proxy' device? is it the upstream pcie-to-pci bridge= ? >> =20 > > Yes. Or, in the case of a certain buggy Ricoh multi-function device, = it > is function zero -- all other functions do their DMA as if it came fr= om > function zero. > > This is why I'm looking at whether we need the whole tree-walking thi= ng, > or whether we can just have a single 'proxy' device (or 'alias' as it= 's > called in the AMD code). > > =20 >> all the device behind a pcie-to-pci bridge must be co-assigned to a=20 >> single domain. So it also require users to detach them together. >> =20 > > Do we even _have_ an API for KVM to assign multiple devices at the sa= me > time? Or an API for KVM to _determine_ which devices are behind the s= ame > 'proxy'? > =20 No. I think it's better to add some checks in management tool (e.g.=20 libvirt) to determine if the devices can be assigned or not before=20 device assignment, such as all devices behind same 'proxy' are=20 co-assigned (checking command line), and devices can be reset (FLR,=20 secondary bus reset, D0->D3 etc.) I think libvirt already has these=20 checks for device assignment. > We could even have a new device hotplugged after the assignment has b= een > done. > =20 yes, but for these legacy devices, they only can be assigned and=20 de-assigned at the same time. Regards, Weidong