From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44148)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1UjZ5I-0003FH-WA
	for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:02:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1UjZ5H-00051q-5s
	for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:02:20 -0400
Received: from mx1.redhat.com ([209.132.183.28]:54384)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.williamson@redhat.com>) id 1UjZ5G-00051j-Su
	for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:02:19 -0400
Message-ID: <1370282529.30975.344.camel@ul30vt.home>
From: Alex Williamson <alex.williamson@redhat.com>
Date: Mon, 03 Jun 2013 12:02:09 -0600
In-Reply-To: <20130603163305.GC4094@irqsave.net>
References: <20130603163305.GC4094@irqsave.net>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?ISO-8859-1?Q?Beno=EEt?= Canet <benoit.canet@irqsave.net>
Cc: iommu@lists.linux-foundation.org, qemu-devel@nongnu.org

On Mon, 2013-06-03 at 18:33 +0200, Beno=C3=AEt Canet wrote:
> Hello,
>=20
> I plan to write a PF driver for an SR-IOV card and make the VFs work wi=
th QEMU's
> VFIO passthrough so I am asking the following design question before tr=
ying to
> write and push code.
>=20
> After SR-IOV being enabled on this hardware only one VF function can be=
 active
> at a given time.

Is this actually an SR-IOV device or are you trying to write a driver
that emulates SR-IOV for a PF?

> The PF host kernel driver is acting as a scheduler.
> It switch every few milliseconds which VF is the current active functio=
n while
> disabling the others VFs.
>=20
> One consequence of how the hardware works is that the MMR regions of th=
e
> switched off VFs must be unmapped and their io access should block unti=
l the VF
> is switched on again.

MMR =3D Memory Mapped Register?

This seems contradictory to the SR-IOV spec, which states:

        Each VF contains a non-shared set of physical resources required
        to deliver Function-specific
        services, e.g., resources such as work queues, data buffers,
        etc. These resources can be directly
        accessed by an SI without requiring VI or SR-PCIM intervention.

Furthermore, each VF should have a separate requester ID.  What's being
suggested here seems like maybe that's not the case.  If true, it would
make iommu groups challenging.  Is there any VF save/restore around the
scheduling?

> Each IOMMU map/unmap should be done in less than 100ns.

I think that may be a lot to ask if we need to unmap the regions in the
guest and in the iommu.  If the "VFs" used different requester IDs,
iommu unmapping whouldn't be necessary.  I experimented with switching
between trapped (read/write) access to memory regions and mmap'd (direct
mapping) for handling legacy interrupts.  There was a noticeable
performance penalty switching per interrupt.

> As the kernel iommu module is being called by the VFIO driver the PF dr=
iver
> cannot interface with it.
>=20
> Currently the only interface of the VFIO code is for the userland QEMU =
process
> and I fear that notifying QEMU that it should do the unmap/block would =
take more
> than 100ns.
>=20
> Also blocking the IO access in QEMU under the BQL would freeze QEMU.
>=20
> Do you have and idea on how to write this required map and block/unmap =
feature ?

It seems like there are several options, but I'm doubtful that any of
them will meet 100ns.  If this is completely fake SR-IOV and there's not
a different requester ID per VF, I'd start with seeing if you can even
do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns.  If that's
close to your limit, then your only real option for QEMU is to freeze
it, which still involves getting multiple (maybe many) vCPUs out of VM
mode.  That's not free either.  If by some miracle you have time to
spare, you could remap the regions to trapped mode and let the vCPUs run
while vfio blocks on read/write.

Maybe there's even a question whether mmap'd mode is worthwhile for this
device.  Trapping every read/write is orders of magnitude slower, but
allows you to handle the "wait for VF" on the kernel side.

If you can provide more info on the device design/contraints, maybe we
can come up with better options.  Thanks,

Alex