From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50780) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UN9wA-0004zW-8B for qemu-devel@nongnu.org; Tue, 02 Apr 2013 18:44:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UN9w7-0001J4-9s for qemu-devel@nongnu.org; Tue, 02 Apr 2013 18:44:18 -0400 Received: from mail-db8lp0184.outbound.messaging.microsoft.com ([213.199.154.184]:30363 helo=db8outboundpool.messaging.microsoft.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UN9w7-0001Iq-1Q for qemu-devel@nongnu.org; Tue, 02 Apr 2013 18:44:15 -0400 Date: Tue, 2 Apr 2013 17:44:06 -0500 From: Scott Wood In-Reply-To: <1364938324.2882.179.camel@bling.home> (from alex.williamson@redhat.com on Tue Apr 2 16:32:04 2013) Message-ID: <1364942646.24520.27@snotra> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; delsp=Yes; format=Flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Wood Scott-B07421 , "kvm@vger.kernel.org" , "agraf@suse.de" , "qemu-devel@nongnu.org" , Yoder Stuart-B08248 , "iommu@lists.linux-foundation.org" , Bhushan Bharat-R65777 On 04/02/2013 04:32:04 PM, Alex Williamson wrote: > On Tue, 2013-04-02 at 15:57 -0500, Scott Wood wrote: > > On 04/02/2013 03:32:17 PM, Alex Williamson wrote: > > > On x86 the interrupt remapper handles this transparently when MSI > > > is enabled and userspace never gets direct access to the device =20 > MSI > > > address/data registers. > > > > x86 has a totally different mechanism here, as far as I understand =20 > -- > > even before you get into restrictions on mappings. >=20 > So what control will userspace have over programming the actually MSI > vectors on PAMU? Not sure what you mean -- PAMU doesn't get explicitly involved in =20 MSIs. It's just another 4K page mapping (per relevant MSI bank). If =20 you want isolation, you need to make sure that an MSI group is only =20 used by one VFIO group, and that you're on a chip that has alias pages =20 with just one MSI bank register each (newer chips do, but the first =20 chip to have a PAMU didn't). > > > This could also be done as another "type2" ioctl extension. > > > > Again, what is "type2", specifically? If someone else is adding =20 > their > > own IOMMU that is kind of, sort of like PAMU, how would they know if > > it's close enough? What assumptions can a user make when they see =20 > that > > they're dealing with "type2"? >=20 > Naming always has and always will be a problem. I assume this is =20 > named > type2 rather than PAMU because it's trying to expose a generic =20 > windowed > IOMMU fitting the IOMMU API. But how closely is the MSI situation related to a generic windowed =20 IOMMU, then? We could just as well have a highly flexible IOMMU in =20 terms of arbitrary 4K page mappings, but still handle MSIs as pages to =20 be mapped rather than a translation table. Or we could have a windowed =20 IOMMU that has an MSI translation table. > Like type1, it doesn't really make sense > to name it "IOMMU API" because that's a kernel internal interface and > we're designing a userspace interface that just happens to use that. > Tagging it to a piece of hardware makes it less reusable. Well, that's my point. Is it reusable at all, anyway? If not, then =20 giving it a more obscure name won't change that. If it is reusable, =20 then where is the line drawn between things that are PAMU-specific or =20 MPIC-specific and things that are part of the "generic windowed IOMMU" =20 abstraction? > Type1 is arbitrary. It might as well be named "brown" and this one =20 > can be > "blue". The difference is that "type1" seems to refer to hardware that can do =20 arbitrary 4K page mappings, possibly constrained by an aperture but =20 nothing else. More than one IOMMU can reasonably fit that. The odds =20 that another IOMMU would have exactly the same restrictions as PAMU =20 seem smaller in comparison. In any case, if you had to deal with some Intel-only quirk, would it =20 make sense to call it a "type1 attribute"? I'm not advocating one way =20 or the other on whether an abstraction is viable here (though Stuart =20 seems to think it's "highly unlikely anything but a PAMU will comply"), =20 just that if it is to be abstracted rather than a hardware-specific =20 interface, we need to document what is and is not part of the =20 abstraction. Otherwise a non-PAMU-specific user won't know what they =20 can rely on, and someone adding support for a new windowed IOMMU won't =20 know if theirs is close enough, or they need to introduce a "type3". > > > What's the value to userspace in determining which windows are =20 > used > > > by which banks? > > > > That depends on who programs the MSI config space address. What is > > important is userspace controlling which iovas will be dedicated to > > this, in case it wants to put something else there. >=20 > So userspace is programming the MSI vectors, targeting a user =20 > programmed > iova? But an iova selects a window and I thought there were some =20 > number > of MSI banks and we don't really know which ones we'll need... still > confused. Userspace would also need a way to find out the page offset and data =20 value. That may be an argument in favor of having the two ioctls =20 Stuart later suggested (get MSI count, and map MSI). Would there be =20 any complication in the VFIO code from tracking a mapping that doesn't =20 have a userspace virtual address associated with it? > > There's going to be special stuff no matter what. This would keep =20 > it > > separated from the IOMMU map code. > > > > I'm not sure what you mean by "overhead" here... the runtime =20 > overhead > > of setting things up is not particularly relevant as long as it's > > reasonable. If you mean development and maintenance effort, keeping > > things well separated should help. >=20 > Overhead in terms of code required and complexity. More things to > reference count and shut down in the proper order on userspace exit. > Thanks, That didn't stop others from having me convert the KVM device control =20 API to use file descriptors instead of something more ad-hoc with a =20 better-defined destruction order. :-) I don't know if it necessarily needs to be a separate fd -- it could be =20 just another device resource like BARs, with some way for userspace to =20 tell if the page is shared by multiple devices in the group (e.g. make =20 the physical address visible). -Scott=