From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57312)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ddutile@redhat.com>) id 1UjZbd-0002yg-1s
	for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:48 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ddutile@redhat.com>) id 1UjZba-0000LX-Q8
	for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:44 -0400
Received: from mx1.redhat.com ([209.132.183.28]:24224)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ddutile@redhat.com>) id 1UjZba-0000L8-IY
	for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:42 -0400
Message-ID: <51ACE1B5.2050102@redhat.com>
Date: Mon, 03 Jun 2013 14:34:29 -0400
From: Don Dutile <ddutile@redhat.com>
MIME-Version: 1.0
References: <20130603163305.GC4094@irqsave.net>
	<1370282529.30975.344.camel@ul30vt.home>
In-Reply-To: <1370282529.30975.344.camel@ul30vt.home>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: =?UTF-8?B?QmVub8OudCBDYW5ldA==?= <benoit.canet@irqsave.net>, iommu@lists.linux-foundation.org, qemu-devel@nongnu.org

On 06/03/2013 02:02 PM, Alex Williamson wrote:
> On Mon, 2013-06-03 at 18:33 +0200, Beno=C3=AEt Canet wrote:
>> Hello,
>>
>> I plan to write a PF driver for an SR-IOV card and make the VFs work w=
ith QEMU's
>> VFIO passthrough so I am asking the following design question before t=
rying to
>> write and push code.
>>
>> After SR-IOV being enabled on this hardware only one VF function can b=
e active
>> at a given time.
>
> Is this actually an SR-IOV device or are you trying to write a driver
> that emulates SR-IOV for a PF?
>
>> The PF host kernel driver is acting as a scheduler.
>> It switch every few milliseconds which VF is the current active functi=
on while
>> disabling the others VFs.
>>
that's time-sharing of hw, which sw doesn't see ... so, ok.

>> One consequence of how the hardware works is that the MMR regions of t=
he
>> switched off VFs must be unmapped and their io access should block unt=
il the VF
>> is switched on again.
>
This violates the spec., and does impact sw -- how can one assign such a =
VF to a guest
-- it does not work indep. of other VFs.

> MMR =3D Memory Mapped Register?
>
> This seems contradictory to the SR-IOV spec, which states:
>
>          Each VF contains a non-shared set of physical resources requir=
ed
>          to deliver Function-specific
>          services, e.g., resources such as work queues, data buffers,
>          etc. These resources can be directly
>          accessed by an SI without requiring VI or SR-PCIM intervention.
>
> Furthermore, each VF should have a separate requester ID.  What's being
> suggested here seems like maybe that's not the case.  If true, it would
I didn't read it that way above.  I read it as the PCIe end is timeshared
btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspec=
tive)
as if the device was hot unplug w/o notification.  That will probably cau=
se
read-timeouts & SME's, bringing down most enterprise-level systems.

> make iommu groups challenging.  Is there any VF save/restore around the
> scheduling?
>
>> Each IOMMU map/unmap should be done in less than 100ns.
>
> I think that may be a lot to ask if we need to unmap the regions in the
> guest and in the iommu.  If the "VFs" used different requester IDs,
> iommu unmapping whouldn't be necessary.  I experimented with switching
> between trapped (read/write) access to memory regions and mmap'd (direc=
t
> mapping) for handling legacy interrupts.  There was a noticeable
> performance penalty switching per interrupt.
>
>> As the kernel iommu module is being called by the VFIO driver the PF d=
river
>> cannot interface with it.
>>
>> Currently the only interface of the VFIO code is for the userland QEMU=
 process
>> and I fear that notifying QEMU that it should do the unmap/block would=
 take more
>> than 100ns.
>>
>> Also blocking the IO access in QEMU under the BQL would freeze QEMU.
>>
>> Do you have and idea on how to write this required map and block/unmap=
 feature ?
>
> It seems like there are several options, but I'm doubtful that any of
> them will meet 100ns.  If this is completely fake SR-IOV and there's no=
t
> a different requester ID per VF, I'd start with seeing if you can even
> do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns.  If that'=
s
> close to your limit, then your only real option for QEMU is to freeze
> it, which still involves getting multiple (maybe many) vCPUs out of VM
> mode.  That's not free either.  If by some miracle you have time to
> spare, you could remap the regions to trapped mode and let the vCPUs ru=
n
> while vfio blocks on read/write.
>
> Maybe there's even a question whether mmap'd mode is worthwhile for thi=
s
> device.  Trapping every read/write is orders of magnitude slower, but
> allows you to handle the "wait for VF" on the kernel side.
>
> If you can provide more info on the device design/contraints, maybe we
> can come up with better options.  Thanks,
>
> Alex
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu