From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:50477)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <B07421@freescale.com>) id 1UN878-0002YB-Qa
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 16:47:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <B07421@freescale.com>) id 1UN875-0004x8-VH
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 16:47:30 -0400
Received: from ch1ehsobe003.messaging.microsoft.com ([216.32.181.183]:14057
	helo=ch1outboundpool.messaging.microsoft.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <B07421@freescale.com>) id 1UN875-0004wu-Q7
	for qemu-devel@nongnu.org; Tue, 02 Apr 2013 16:47:27 -0400
Date: Tue, 2 Apr 2013 15:47:14 -0500
From: Scott Wood <scottwood@freescale.com>
References: <9F6FE96B71CF29479FF1CDC8046E15035A0F13@039-SN1MPN1-003.039d.mgd.msft.net>
	<1364931564.24520.16@snotra>
	<CALRxmdDdZ49A4jJinqu9FKwtF3uRMrGS-hWotZ5jaghvAWnTDQ@mail.gmail.com>
In-Reply-To: <CALRxmdDdZ49A4jJinqu9FKwtF3uRMrGS-hWotZ5jaghvAWnTDQ@mail.gmail.com>
	(from b08248@gmail.com on Tue Apr  2 15:38:42 2013)
Message-ID: <1364935634.24520.22@snotra>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; delsp=Yes; format=Flowed
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] RFC: vfio API changes needed for powerpc
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stuart Yoder <b08248@gmail.com>
Cc: Wood Scott-B07421 <B07421@freescale.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "agraf@suse.de" <agraf@suse.de>, Yoder Stuart-B08248 <B08248@freescale.com>, "iommu@lists.linux-foundation.org" <iommu@lists.linux-foundation.org>, Bhushan Bharat-R65777 <R65777@freescale.com>

On 04/02/2013 03:38:42 PM, Stuart Yoder wrote:
> On Tue, Apr 2, 2013 at 2:39 PM, Scott Wood <scottwood@freescale.com> =20
> wrote:
> > On 04/02/2013 12:32:00 PM, Yoder Stuart-B08248 wrote:
> >>
> >> Alex,
> >>
> >> We are in the process of implementing vfio-pci support for the =20
> Freescale
> >> IOMMU (PAMU).  It is an aperture/window-based IOMMU and is quite =20
> different
> >> than x86, and will involve creating a 'type 2' vfio implementation.
> >>
> >> For each device's DMA mappings, PAMU has an overall aperture and a =20
> number
> >> of windows.  All sizes and window counts must be power of 2.  To
> >> illustrate,
> >> below is a mapping for a 256MB guest, including guest memory =20
> (backed by
> >> 64MB huge pages) and some windows for MSIs:
> >>
> >>     Total aperture: 512MB
> >>     # of windows: 8
> >>
> >>     win gphys/
> >>     #   iova        phys          size
> >>     --- ----        ----          ----
> >>     0   0x00000000  0xX_XX000000  64MB
> >>     1   0x04000000  0xX_XX000000  64MB
> >>     2   0x08000000  0xX_XX000000  64MB
> >>     3   0x0C000000  0xX_XX000000  64MB
> >>     4   0x10000000  0xf_fe044000  4KB    // msi bank 1
> >>     5   0x14000000  0xf_fe045000  4KB    // msi bank 2
> >>     6   0x18000000  0xf_fe046000  4KB    // msi bank 3
> >>     7            -             -  disabled
> >>
> >> There are a couple of updates needed to the vfio user->kernel =20
> interface
> >> that we would like your feedback on.
> >>
> >> 1.  IOMMU geometry
> >>
> >>    The kernel IOMMU driver now has an interface (see =20
> domain_set_attr,
> >>    domain_get_attr) that lets us set the domain geometry using
> >>    "attributes".
> >>
> >>    We want to expose that to user space, so envision needing a =20
> couple
> >>    of new ioctls to do this:
> >>         VFIO_IOMMU_SET_ATTR
> >>         VFIO_IOMMU_GET_ATTR
> >
> >
> > Note that this means attributes need to be updated for user-API
> > appropriateness, such as using fixed-size types.
> >
> >
> >> 2.   MSI window mappings
> >>
> >>    The more problematic question is how to deal with MSIs.  We =20
> need to
> >>    create mappings for up to 3 MSI banks that a device may need to =20
> target
> >>    to generate interrupts.  The Linux MSI driver can allocate MSIs =20
> from
> >>    the 3 banks any way it wants, and currently user space has no =20
> way of
> >>    knowing which bank may be used for a given device.
> >>
> >>    There are 3 options we have discussed and would like your =20
> direction:
> >>
> >>    A.  Implicit mappings -- with this approach user space would not
> >>        explicitly map MSIs.  User space would be required to set =20
> the
> >>        geometry so that there are 3 unused windows (the last 3 =20
> windows)
> >
> >
> > Where does userspace get the number "3" from?  E.g. on newer chips =20
> there are
> > 4 MSI banks.  Maybe future chips have even more.
>=20
> Ok, then make the number 4.   The chance of more MSI banks in future =20
> chips
> is nil,

What makes you so sure?  Especially since you seem to be presenting =20
this as not specifically an MPIC API.

> and if it ever happened user space could adjust.

What bit of API is going to tell it that it needs to adjust?

> Also, practically speaking since memory is typically allocate in =20
> powers of
> 2 way you need to approximately double the window geometry anyway.

Only if your existing mapping needs fit exactly in a power of two.

> >>    B.  Explicit mapping using DMA map flags.  The idea is that a =20
> new
> >>        flag to DMA map (VFIO_DMA_MAP_FLAG_MSI) would mean that
> >>        a mapping is to be created for the supplied iova.  No vaddr
> >>        is given though.  So in the above example there would be a
> >>        a dma map at 0x10000000 for 24KB (and no vaddr).
> >
> >
> > A single 24 KiB mapping wouldn't work (and why 24KB? What if only =20
> one MSI
> > group is involved in this VFIO group?  What if four MSI groups are
> > involved?).  You'd need to either have a naturally aligned, =20
> power-of-two
> > sized mapping that covers exactly the pages you want to map and no =20
> more, or
> > you'd need to create a separate mapping for each MSI bank, and due =20
> to PAMU
> > subwindow alignment restrictions these mappings could not be =20
> contiguous in
> > iova-space.
>=20
> You're right, a single 24KB mapping wouldn't work--  in the case of 3 =20
> MSI banks
> perhaps we could just do one 64MB*3 mapping to identify which windows
> are used for MSIs.

Where did the assumption of a 64MiB subwindow size come from?

> If only one MSI bank was involved the kernel could get clever and =20
> only enable
> the banks actually needed.

I'd rather see cleverness kept in userspace.

-Scott=