From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:39206)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1QvcNp-0004VX-8g
	for qemu-devel@nongnu.org; Mon, 22 Aug 2011 17:50:14 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1QvcNo-0003d7-4U
	for qemu-devel@nongnu.org; Mon, 22 Aug 2011 17:50:13 -0400
Received: from gate.crashing.org ([63.228.1.57]:54456)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1QvcNn-0003ci-RV
	for qemu-devel@nongnu.org; Mon, 22 Aug 2011 17:50:12 -0400
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In-Reply-To: <CA781A61.FB15%aafabbri@cisco.com>
References: <CA781A61.FB15%aafabbri@cisco.com>
Content-Type: text/plain; charset="UTF-8"
Date: Tue, 23 Aug 2011 07:49:45 +1000
Message-ID: <1314049785.7662.44.camel@pasglop>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] kvm PCI assignment & VFIO ramblings
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: aafabbri <aafabbri@cisco.com>
Cc: Alexey Kardashevskiy <aik@au1.ibm.com>, kvm@vger.kernel.org, Paul Mackerras <pmac@au1.ibm.com>, "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>, qemu-devel <qemu-devel@nongnu.org>, iommu <iommu@lists.linux-foundation.org>, chrisw <chrisw@sous-sol.org>, Alex Williamson <alex.williamson@redhat.com>, Avi Kivity <avi@redhat.com>, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, benve@cisco.com


> > I wouldn't use uiommu for that.
> 
> Any particular reason besides saving a file descriptor?
> 
> We use it today, and it seems like a cleaner API than what you propose
> changing it to.

Well for one, we are back to square one vs. grouping constraints.

 .../...

> If we in singleton-group land were building our own "groups" which were sets
> of devices sharing the IOMMU domains we wanted, I suppose we could do away
> with uiommu fds, but it sounds like the current proposal would create 20
> singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
> endpoints).  Asking me to ioctl(inherit) them together into a blob sounds
> worse than the current explicit uiommu API.

I'd rather have an API to create super-groups (groups of groups)
statically and then you can use such groups as normal groups using the
same interface. That create/management process could be done via a
simple command line utility or via sysfs banging, whatever...

Cheers,
Ben.

> Thanks,
> Aaron
> 
> > 
> > Another option is to make that static configuration APIs via special
> > ioctls (or even netlink if you really like it), to change the grouping
> > on architectures that allow it.
> > 
> > Cheers.
> > Ben.
> > 
> >> 
> >> -Aaron
> >> 
> >>> As necessary in the future, we can
> >>> define a more high performance dma mapping interface for streaming dma
> >>> via the group fd.  I expect we'll also include architecture specific
> >>> group ioctls to describe features and capabilities of the iommu.  The
> >>> group fd will need to prevent concurrent open()s to maintain a 1:1 group
> >>> to userspace process ownership model.
> >>> 
> >>> Also on the table is supporting non-PCI devices with vfio.  To do this,
> >>> we need to generalize the read/write/mmap and irq eventfd interfaces.
> >>> We could keep the same model of segmenting the device fd address space,
> >>> perhaps adding ioctls to define the segment offset bit position or we
> >>> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> >>> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> >>> suffering some degree of fd bloat (group fd, device fd(s), interrupt
> >>> event fd(s), per resource fd, etc).  For interrupts we can overload
> >>> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> >>> devices support MSI?).
> >>> 
> >>> For qemu, these changes imply we'd only support a model where we have a
> >>> 1:1 group to iommu domain.  The current vfio driver could probably
> >>> become vfio-pci as we might end up with more target specific vfio
> >>> drivers for non-pci.  PCI should be able to maintain a simple -device
> >>> vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
> >>> need to come up with extra options when we need to expose groups to
> >>> guest for pvdma.
> >>> 
> >>> Hope that captures it, feel free to jump in with corrections and
> >>> suggestions.  Thanks,
> >>> 
> >>> Alex
> >>> 
> > 
> >