From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50477) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UN878-0002YB-Qa for qemu-devel@nongnu.org; Tue, 02 Apr 2013 16:47:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UN875-0004x8-VH for qemu-devel@nongnu.org; Tue, 02 Apr 2013 16:47:30 -0400 Received: from ch1ehsobe003.messaging.microsoft.com ([216.32.181.183]:14057 helo=ch1outboundpool.messaging.microsoft.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UN875-0004wu-Q7 for qemu-devel@nongnu.org; Tue, 02 Apr 2013 16:47:27 -0400 Date: Tue, 2 Apr 2013 15:47:14 -0500 From: Scott Wood References: <9F6FE96B71CF29479FF1CDC8046E15035A0F13@039-SN1MPN1-003.039d.mgd.msft.net> <1364931564.24520.16@snotra> In-Reply-To: (from b08248@gmail.com on Tue Apr 2 15:38:42 2013) Message-ID: <1364935634.24520.22@snotra> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; delsp=Yes; format=Flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stuart Yoder Cc: Wood Scott-B07421 , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , "agraf@suse.de" , Yoder Stuart-B08248 , "iommu@lists.linux-foundation.org" , Bhushan Bharat-R65777 On 04/02/2013 03:38:42 PM, Stuart Yoder wrote: > On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood =20 > wrote: > > On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote: > >> > >> Alex, > >> > >> We are in the process of implementing vfio-pci support for the =20 > Freescale > >> IOMMU (PAMU). It is an aperture/window-based IOMMU and is quite =20 > different > >> than x86, and will involve creating a 'type 2' vfio implementation. > >> > >> For each device's DMA mappings, PAMU has an overall aperture and a =20 > number > >> of windows. All sizes and window counts must be power of 2. To > >> illustrate, > >> below is a mapping for a 256MB guest, including guest memory =20 > (backed by > >> 64MB huge pages) and some windows for MSIs: > >> > >> Total aperture: 512MB > >> # of windows: 8 > >> > >> win gphys/ > >> # iova phys size > >> --- ---- ---- ---- > >> 0 0x00000000 0xX_XX000000 64MB > >> 1 0x04000000 0xX_XX000000 64MB > >> 2 0x08000000 0xX_XX000000 64MB > >> 3 0x0C000000 0xX_XX000000 64MB > >> 4 0x10000000 0xf_fe044000 4KB // msi bank 1 > >> 5 0x14000000 0xf_fe045000 4KB // msi bank 2 > >> 6 0x18000000 0xf_fe046000 4KB // msi bank 3 > >> 7 - - disabled > >> > >> There are a couple of updates needed to the vfio user->kernel =20 > interface > >> that we would like your feedback on. > >> > >> 1. IOMMU geometry > >> > >> The kernel IOMMU driver now has an interface (see =20 > domain_set_attr, > >> domain_get_attr) that lets us set the domain geometry using > >> "attributes". > >> > >> We want to expose that to user space, so envision needing a =20 > couple > >> of new ioctls to do this: > >> VFIO_IOMMU_SET_ATTR > >> VFIO_IOMMU_GET_ATTR > > > > > > Note that this means attributes need to be updated for user-API > > appropriateness, such as using fixed-size types. > > > > > >> 2. MSI window mappings > >> > >> The more problematic question is how to deal with MSIs. We =20 > need to > >> create mappings for up to 3 MSI banks that a device may need to =20 > target > >> to generate interrupts. The Linux MSI driver can allocate MSIs =20 > from > >> the 3 banks any way it wants, and currently user space has no =20 > way of > >> knowing which bank may be used for a given device. > >> > >> There are 3 options we have discussed and would like your =20 > direction: > >> > >> A. Implicit mappings -- with this approach user space would not > >> explicitly map MSIs. User space would be required to set =20 > the > >> geometry so that there are 3 unused windows (the last 3 =20 > windows) > > > > > > Where does userspace get the number "3" from? E.g. on newer chips =20 > there are > > 4 MSI banks. Maybe future chips have even more. >=20 > Ok, then make the number 4. The chance of more MSI banks in future =20 > chips > is nil, What makes you so sure? Especially since you seem to be presenting =20 this as not specifically an MPIC API. > and if it ever happened user space could adjust. What bit of API is going to tell it that it needs to adjust? > Also, practically speaking since memory is typically allocate in =20 > powers of > 2 way you need to approximately double the window geometry anyway. Only if your existing mapping needs fit exactly in a power of two. > >> B. Explicit mapping using DMA map flags. The idea is that a =20 > new > >> flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that > >> a mapping is to be created for the supplied iova. No vaddr > >> is given though. So in the above example there would be a > >> a dma map at 0x10000000 for 24KB (and no vaddr). > > > > > > A single 24 KiB mapping wouldn't work (and why 24KB? What if only =20 > one MSI > > group is involved in this VFIO group? What if four MSI groups are > > involved?). You'd need to either have a naturally aligned, =20 > power-of-two > > sized mapping that covers exactly the pages you want to map and no =20 > more, or > > you'd need to create a separate mapping for each MSI bank, and due =20 > to PAMU > > subwindow alignment restrictions these mappings could not be =20 > contiguous in > > iova-space. >=20 > You're right, a single 24KB mapping wouldn't work-- in the case of 3 =20 > MSI banks > perhaps we could just do one 64MB*3 mapping to identify which windows > are used for MSIs. Where did the assumption of a 64MiB subwindow size come from? > If only one MSI bank was involved the kernel could get clever and =20 > only enable > the banks actually needed. I'd rather see cleverness kept in userspace. -Scott=