From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57312) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UjZbd-0002yg-1s for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UjZba-0000LX-Q8 for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:24224) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UjZba-0000L8-IY for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:42 -0400 Message-ID: <51ACE1B5.2050102@redhat.com> Date: Mon, 03 Jun 2013 14:34:29 -0400 From: Don Dutile MIME-Version: 1.0 References: <20130603163305.GC4094@irqsave.net> <1370282529.30975.344.camel@ul30vt.home> In-Reply-To: <1370282529.30975.344.camel@ul30vt.home> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: =?UTF-8?B?QmVub8OudCBDYW5ldA==?= , iommu@lists.linux-foundation.org, qemu-devel@nongnu.org On 06/03/2013 02:02 PM, Alex Williamson wrote: > On Mon, 2013-06-03 at 18:33 +0200, Beno=C3=AEt Canet wrote: >> Hello, >> >> I plan to write a PF driver for an SR-IOV card and make the VFs work w= ith QEMU's >> VFIO passthrough so I am asking the following design question before t= rying to >> write and push code. >> >> After SR-IOV being enabled on this hardware only one VF function can b= e active >> at a given time. > > Is this actually an SR-IOV device or are you trying to write a driver > that emulates SR-IOV for a PF? > >> The PF host kernel driver is acting as a scheduler. >> It switch every few milliseconds which VF is the current active functi= on while >> disabling the others VFs. >> that's time-sharing of hw, which sw doesn't see ... so, ok. >> One consequence of how the hardware works is that the MMR regions of t= he >> switched off VFs must be unmapped and their io access should block unt= il the VF >> is switched on again. > This violates the spec., and does impact sw -- how can one assign such a = VF to a guest -- it does not work indep. of other VFs. > MMR =3D Memory Mapped Register? > > This seems contradictory to the SR-IOV spec, which states: > > Each VF contains a non-shared set of physical resources requir= ed > to deliver Function-specific > services, e.g., resources such as work queues, data buffers, > etc. These resources can be directly > accessed by an SI without requiring VI or SR-PCIM intervention. > > Furthermore, each VF should have a separate requester ID. What's being > suggested here seems like maybe that's not the case. If true, it would I didn't read it that way above. I read it as the PCIe end is timeshared btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspec= tive) as if the device was hot unplug w/o notification. That will probably cau= se read-timeouts & SME's, bringing down most enterprise-level systems. > make iommu groups challenging. Is there any VF save/restore around the > scheduling? > >> Each IOMMU map/unmap should be done in less than 100ns. > > I think that may be a lot to ask if we need to unmap the regions in the > guest and in the iommu. If the "VFs" used different requester IDs, > iommu unmapping whouldn't be necessary. I experimented with switching > between trapped (read/write) access to memory regions and mmap'd (direc= t > mapping) for handling legacy interrupts. There was a noticeable > performance penalty switching per interrupt. > >> As the kernel iommu module is being called by the VFIO driver the PF d= river >> cannot interface with it. >> >> Currently the only interface of the VFIO code is for the userland QEMU= process >> and I fear that notifying QEMU that it should do the unmap/block would= take more >> than 100ns. >> >> Also blocking the IO access in QEMU under the BQL would freeze QEMU. >> >> Do you have and idea on how to write this required map and block/unmap= feature ? > > It seems like there are several options, but I'm doubtful that any of > them will meet 100ns. If this is completely fake SR-IOV and there's no= t > a different requester ID per VF, I'd start with seeing if you can even > do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns. If that'= s > close to your limit, then your only real option for QEMU is to freeze > it, which still involves getting multiple (maybe many) vCPUs out of VM > mode. That's not free either. If by some miracle you have time to > spare, you could remap the regions to trapped mode and let the vCPUs ru= n > while vfio blocks on read/write. > > Maybe there's even a question whether mmap'd mode is worthwhile for thi= s > device. Trapping every read/write is orders of magnitude slower, but > allows you to handle the "wait for VF" on the kernel side. > > If you can provide more info on the device design/contraints, maybe we > can come up with better options. Thanks, > > Alex > > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu