From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Dutile Subject: Re: VFIO and scheduled SR-IOV cards Date: Mon, 03 Jun 2013 14:34:29 -0400 Message-ID: <51ACE1B5.2050102@redhat.com> References: <20130603163305.GC4094@irqsave.net> <1370282529.30975.344.camel@ul30vt.home> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; Format="flowed" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1370282529.30975.344.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Alex Williamson Cc: =?UTF-8?B?QmVub8OudCBDYW5ldA==?= , iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org List-Id: iommu@lists.linux-foundation.org T24gMDYvMDMvMjAxMyAwMjowMiBQTSwgQWxleCBXaWxsaWFtc29uIHdyb3RlOgo+IE9uIE1vbiwg MjAxMy0wNi0wMyBhdCAxODozMyArMDIwMCwgQmVub8OudCBDYW5ldCB3cm90ZToKPj4gSGVsbG8s Cj4+Cj4+IEkgcGxhbiB0byB3cml0ZSBhIFBGIGRyaXZlciBmb3IgYW4gU1ItSU9WIGNhcmQgYW5k IG1ha2UgdGhlIFZGcyB3b3JrIHdpdGggUUVNVSdzCj4+IFZGSU8gcGFzc3Rocm91Z2ggc28gSSBh bSBhc2tpbmcgdGhlIGZvbGxvd2luZyBkZXNpZ24gcXVlc3Rpb24gYmVmb3JlIHRyeWluZyB0bwo+ PiB3cml0ZSBhbmQgcHVzaCBjb2RlLgo+Pgo+PiBBZnRlciBTUi1JT1YgYmVpbmcgZW5hYmxlZCBv biB0aGlzIGhhcmR3YXJlIG9ubHkgb25lIFZGIGZ1bmN0aW9uIGNhbiBiZSBhY3RpdmUKPj4gYXQg YSBnaXZlbiB0aW1lLgo+Cj4gSXMgdGhpcyBhY3R1YWxseSBhbiBTUi1JT1YgZGV2aWNlIG9yIGFy ZSB5b3UgdHJ5aW5nIHRvIHdyaXRlIGEgZHJpdmVyCj4gdGhhdCBlbXVsYXRlcyBTUi1JT1YgZm9y IGEgUEY/Cj4KPj4gVGhlIFBGIGhvc3Qga2VybmVsIGRyaXZlciBpcyBhY3RpbmcgYXMgYSBzY2hl ZHVsZXIuCj4+IEl0IHN3aXRjaCBldmVyeSBmZXcgbWlsbGlzZWNvbmRzIHdoaWNoIFZGIGlzIHRo ZSBjdXJyZW50IGFjdGl2ZSBmdW5jdGlvbiB3aGlsZQo+PiBkaXNhYmxpbmcgdGhlIG90aGVycyBW RnMuCj4+CnRoYXQncyB0aW1lLXNoYXJpbmcgb2YgaHcsIHdoaWNoIHN3IGRvZXNuJ3Qgc2VlIC4u LiBzbywgb2suCgo+PiBPbmUgY29uc2VxdWVuY2Ugb2YgaG93IHRoZSBoYXJkd2FyZSB3b3JrcyBp cyB0aGF0IHRoZSBNTVIgcmVnaW9ucyBvZiB0aGUKPj4gc3dpdGNoZWQgb2ZmIFZGcyBtdXN0IGJl IHVubWFwcGVkIGFuZCB0aGVpciBpbyBhY2Nlc3Mgc2hvdWxkIGJsb2NrIHVudGlsIHRoZSBWRgo+ PiBpcyBzd2l0Y2hlZCBvbiBhZ2Fpbi4KPgpUaGlzIHZpb2xhdGVzIHRoZSBzcGVjLiwgYW5kIGRv ZXMgaW1wYWN0IHN3IC0tIGhvdyBjYW4gb25lIGFzc2lnbiBzdWNoIGEgVkYgdG8gYSBndWVzdAot LSBpdCBkb2VzIG5vdCB3b3JrIGluZGVwLiBvZiBvdGhlciBWRnMuCgo+IE1NUiA9IE1lbW9yeSBN YXBwZWQgUmVnaXN0ZXI/Cj4KPiBUaGlzIHNlZW1zIGNvbnRyYWRpY3RvcnkgdG8gdGhlIFNSLUlP ViBzcGVjLCB3aGljaCBzdGF0ZXM6Cj4KPiAgICAgICAgICBFYWNoIFZGIGNvbnRhaW5zIGEgbm9u LXNoYXJlZCBzZXQgb2YgcGh5c2ljYWwgcmVzb3VyY2VzIHJlcXVpcmVkCj4gICAgICAgICAgdG8g ZGVsaXZlciBGdW5jdGlvbi1zcGVjaWZpYwo+ICAgICAgICAgIHNlcnZpY2VzLCBlLmcuLCByZXNv dXJjZXMgc3VjaCBhcyB3b3JrIHF1ZXVlcywgZGF0YSBidWZmZXJzLAo+ICAgICAgICAgIGV0Yy4g VGhlc2UgcmVzb3VyY2VzIGNhbiBiZSBkaXJlY3RseQo+ICAgICAgICAgIGFjY2Vzc2VkIGJ5IGFu IFNJIHdpdGhvdXQgcmVxdWlyaW5nIFZJIG9yIFNSLVBDSU0gaW50ZXJ2ZW50aW9uLgo+Cj4gRnVy dGhlcm1vcmUsIGVhY2ggVkYgc2hvdWxkIGhhdmUgYSBzZXBhcmF0ZSByZXF1ZXN0ZXIgSUQuICBX aGF0J3MgYmVpbmcKPiBzdWdnZXN0ZWQgaGVyZSBzZWVtcyBsaWtlIG1heWJlIHRoYXQncyBub3Qg dGhlIGNhc2UuICBJZiB0cnVlLCBpdCB3b3VsZApJIGRpZG4ndCByZWFkIGl0IHRoYXQgd2F5IGFi b3ZlLiAgSSByZWFkIGl0IGFzIHRoZSBQQ0llIGVuZCBpcyB0aW1lc2hhcmVkCmJ0d24gVkZzICgm IFBGcz8pLiAuLi4uIHdpdGggc29tZSBWRnMgZGlzYXBwZWFyaW5nIChmcm9tIGEgZHJpdmVyIHBl cnNwZWN0aXZlKQphcyBpZiB0aGUgZGV2aWNlIHdhcyBob3QgdW5wbHVnIHcvbyBub3RpZmljYXRp b24uICBUaGF0IHdpbGwgcHJvYmFibHkgY2F1c2UKcmVhZC10aW1lb3V0cyAmIFNNRSdzLCBicmlu Z2luZyBkb3duIG1vc3QgZW50ZXJwcmlzZS1sZXZlbCBzeXN0ZW1zLgoKPiBtYWtlIGlvbW11IGdy b3VwcyBjaGFsbGVuZ2luZy4gIElzIHRoZXJlIGFueSBWRiBzYXZlL3Jlc3RvcmUgYXJvdW5kIHRo ZQo+IHNjaGVkdWxpbmc/Cj4KPj4gRWFjaCBJT01NVSBtYXAvdW5tYXAgc2hvdWxkIGJlIGRvbmUg aW4gbGVzcyB0aGFuIDEwMG5zLgo+Cj4gSSB0aGluayB0aGF0IG1heSBiZSBhIGxvdCB0byBhc2sg aWYgd2UgbmVlZCB0byB1bm1hcCB0aGUgcmVnaW9ucyBpbiB0aGUKPiBndWVzdCBhbmQgaW4gdGhl IGlvbW11LiAgSWYgdGhlICJWRnMiIHVzZWQgZGlmZmVyZW50IHJlcXVlc3RlciBJRHMsCj4gaW9t bXUgdW5tYXBwaW5nIHdob3VsZG4ndCBiZSBuZWNlc3NhcnkuICBJIGV4cGVyaW1lbnRlZCB3aXRo IHN3aXRjaGluZwo+IGJldHdlZW4gdHJhcHBlZCAocmVhZC93cml0ZSkgYWNjZXNzIHRvIG1lbW9y eSByZWdpb25zIGFuZCBtbWFwJ2QgKGRpcmVjdAo+IG1hcHBpbmcpIGZvciBoYW5kbGluZyBsZWdh Y3kgaW50ZXJydXB0cy4gIFRoZXJlIHdhcyBhIG5vdGljZWFibGUKPiBwZXJmb3JtYW5jZSBwZW5h bHR5IHN3aXRjaGluZyBwZXIgaW50ZXJydXB0Lgo+Cj4+IEFzIHRoZSBrZXJuZWwgaW9tbXUgbW9k dWxlIGlzIGJlaW5nIGNhbGxlZCBieSB0aGUgVkZJTyBkcml2ZXIgdGhlIFBGIGRyaXZlcgo+PiBj YW5ub3QgaW50ZXJmYWNlIHdpdGggaXQuCj4+Cj4+IEN1cnJlbnRseSB0aGUgb25seSBpbnRlcmZh Y2Ugb2YgdGhlIFZGSU8gY29kZSBpcyBmb3IgdGhlIHVzZXJsYW5kIFFFTVUgcHJvY2Vzcwo+PiBh bmQgSSBmZWFyIHRoYXQgbm90aWZ5aW5nIFFFTVUgdGhhdCBpdCBzaG91bGQgZG8gdGhlIHVubWFw L2Jsb2NrIHdvdWxkIHRha2UgbW9yZQo+PiB0aGFuIDEwMG5zLgo+Pgo+PiBBbHNvIGJsb2NraW5n IHRoZSBJTyBhY2Nlc3MgaW4gUUVNVSB1bmRlciB0aGUgQlFMIHdvdWxkIGZyZWV6ZSBRRU1VLgo+ Pgo+PiBEbyB5b3UgaGF2ZSBhbmQgaWRlYSBvbiBob3cgdG8gd3JpdGUgdGhpcyByZXF1aXJlZCBt YXAgYW5kIGJsb2NrL3VubWFwIGZlYXR1cmUgPwo+Cj4gSXQgc2VlbXMgbGlrZSB0aGVyZSBhcmUg c2V2ZXJhbCBvcHRpb25zLCBidXQgSSdtIGRvdWJ0ZnVsIHRoYXQgYW55IG9mCj4gdGhlbSB3aWxs IG1lZXQgMTAwbnMuICBJZiB0aGlzIGlzIGNvbXBsZXRlbHkgZmFrZSBTUi1JT1YgYW5kIHRoZXJl J3Mgbm90Cj4gYSBkaWZmZXJlbnQgcmVxdWVzdGVyIElEIHBlciBWRiwgSSdkIHN0YXJ0IHdpdGgg c2VlaW5nIGlmIHlvdSBjYW4gZXZlbgo+IGRvIHRoZSBpb21tdV91bm1hcC9pb21tdV9tYXAgb2Yg dGhlIE1NSU8gQkFScyBpbiB1bmRlciAxMDBucy4gIElmIHRoYXQncwo+IGNsb3NlIHRvIHlvdXIg bGltaXQsIHRoZW4geW91ciBvbmx5IHJlYWwgb3B0aW9uIGZvciBRRU1VIGlzIHRvIGZyZWV6ZQo+ IGl0LCB3aGljaCBzdGlsbCBpbnZvbHZlcyBnZXR0aW5nIG11bHRpcGxlIChtYXliZSBtYW55KSB2 Q1BVcyBvdXQgb2YgVk0KPiBtb2RlLiAgVGhhdCdzIG5vdCBmcmVlIGVpdGhlci4gIElmIGJ5IHNv bWUgbWlyYWNsZSB5b3UgaGF2ZSB0aW1lIHRvCj4gc3BhcmUsIHlvdSBjb3VsZCByZW1hcCB0aGUg cmVnaW9ucyB0byB0cmFwcGVkIG1vZGUgYW5kIGxldCB0aGUgdkNQVXMgcnVuCj4gd2hpbGUgdmZp byBibG9ja3Mgb24gcmVhZC93cml0ZS4KPgo+IE1heWJlIHRoZXJlJ3MgZXZlbiBhIHF1ZXN0aW9u IHdoZXRoZXIgbW1hcCdkIG1vZGUgaXMgd29ydGh3aGlsZSBmb3IgdGhpcwo+IGRldmljZS4gIFRy YXBwaW5nIGV2ZXJ5IHJlYWQvd3JpdGUgaXMgb3JkZXJzIG9mIG1hZ25pdHVkZSBzbG93ZXIsIGJ1 dAo+IGFsbG93cyB5b3UgdG8gaGFuZGxlIHRoZSAid2FpdCBmb3IgVkYiIG9uIHRoZSBrZXJuZWwg c2lkZS4KPgo+IElmIHlvdSBjYW4gcHJvdmlkZSBtb3JlIGluZm8gb24gdGhlIGRldmljZSBkZXNp Z24vY29udHJhaW50cywgbWF5YmUgd2UKPiBjYW4gY29tZSB1cCB3aXRoIGJldHRlciBvcHRpb25z LiAgVGhhbmtzLAo+Cj4gQWxleAo+Cj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX18KPiBpb21tdSBtYWlsaW5nIGxpc3QKPiBpb21tdUBsaXN0cy5saW51eC1m b3VuZGF0aW9uLm9yZwo+IGh0dHBzOi8vbGlzdHMubGludXhmb3VuZGF0aW9uLm9yZy9tYWlsbWFu L2xpc3RpbmZvL2lvbW11CgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fXwppb21tdSBtYWlsaW5nIGxpc3QKaW9tbXVAbGlzdHMubGludXgtZm91bmRhdGlvbi5v cmcKaHR0cHM6Ly9saXN0cy5saW51eGZvdW5kYXRpb24ub3JnL21haWxtYW4vbGlzdGluZm8vaW9t bXU= From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57312) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UjZbd-0002yg-1s for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UjZba-0000LX-Q8 for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:24224) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UjZba-0000L8-IY for qemu-devel@nongnu.org; Mon, 03 Jun 2013 14:35:42 -0400 Message-ID: <51ACE1B5.2050102@redhat.com> Date: Mon, 03 Jun 2013 14:34:29 -0400 From: Don Dutile MIME-Version: 1.0 References: <20130603163305.GC4094@irqsave.net> <1370282529.30975.344.camel@ul30vt.home> In-Reply-To: <1370282529.30975.344.camel@ul30vt.home> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: =?UTF-8?B?QmVub8OudCBDYW5ldA==?= , iommu@lists.linux-foundation.org, qemu-devel@nongnu.org On 06/03/2013 02:02 PM, Alex Williamson wrote: > On Mon, 2013-06-03 at 18:33 +0200, Beno=C3=AEt Canet wrote: >> Hello, >> >> I plan to write a PF driver for an SR-IOV card and make the VFs work w= ith QEMU's >> VFIO passthrough so I am asking the following design question before t= rying to >> write and push code. >> >> After SR-IOV being enabled on this hardware only one VF function can b= e active >> at a given time. > > Is this actually an SR-IOV device or are you trying to write a driver > that emulates SR-IOV for a PF? > >> The PF host kernel driver is acting as a scheduler. >> It switch every few milliseconds which VF is the current active functi= on while >> disabling the others VFs. >> that's time-sharing of hw, which sw doesn't see ... so, ok. >> One consequence of how the hardware works is that the MMR regions of t= he >> switched off VFs must be unmapped and their io access should block unt= il the VF >> is switched on again. > This violates the spec., and does impact sw -- how can one assign such a = VF to a guest -- it does not work indep. of other VFs. > MMR =3D Memory Mapped Register? > > This seems contradictory to the SR-IOV spec, which states: > > Each VF contains a non-shared set of physical resources requir= ed > to deliver Function-specific > services, e.g., resources such as work queues, data buffers, > etc. These resources can be directly > accessed by an SI without requiring VI or SR-PCIM intervention. > > Furthermore, each VF should have a separate requester ID. What's being > suggested here seems like maybe that's not the case. If true, it would I didn't read it that way above. I read it as the PCIe end is timeshared btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspec= tive) as if the device was hot unplug w/o notification. That will probably cau= se read-timeouts & SME's, bringing down most enterprise-level systems. > make iommu groups challenging. Is there any VF save/restore around the > scheduling? > >> Each IOMMU map/unmap should be done in less than 100ns. > > I think that may be a lot to ask if we need to unmap the regions in the > guest and in the iommu. If the "VFs" used different requester IDs, > iommu unmapping whouldn't be necessary. I experimented with switching > between trapped (read/write) access to memory regions and mmap'd (direc= t > mapping) for handling legacy interrupts. There was a noticeable > performance penalty switching per interrupt. > >> As the kernel iommu module is being called by the VFIO driver the PF d= river >> cannot interface with it. >> >> Currently the only interface of the VFIO code is for the userland QEMU= process >> and I fear that notifying QEMU that it should do the unmap/block would= take more >> than 100ns. >> >> Also blocking the IO access in QEMU under the BQL would freeze QEMU. >> >> Do you have and idea on how to write this required map and block/unmap= feature ? > > It seems like there are several options, but I'm doubtful that any of > them will meet 100ns. If this is completely fake SR-IOV and there's no= t > a different requester ID per VF, I'd start with seeing if you can even > do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns. If that'= s > close to your limit, then your only real option for QEMU is to freeze > it, which still involves getting multiple (maybe many) vCPUs out of VM > mode. That's not free either. If by some miracle you have time to > spare, you could remap the regions to trapped mode and let the vCPUs ru= n > while vfio blocks on read/write. > > Maybe there's even a question whether mmap'd mode is worthwhile for thi= s > device. Trapping every read/write is orders of magnitude slower, but > allows you to handle the "wait for VF" on the kernel side. > > If you can provide more info on the device design/contraints, maybe we > can come up with better options. Thanks, > > Alex > > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu