From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel Date: Fri, 20 Nov 2015 10:03:04 -0700 Message-ID: <1448038984.4697.284.camel@redhat.com> References: <53D215D3.50608@intel.com> <547FCAAD.2060406@intel.com> <54AF967B.3060503@intel.com> <5527CEC4.9080700@intel.com> <559B3E38.1080707@intel.com> <562F4311.9@intel.com> <1447870341.4697.92.camel@redhat.com> <1447963356.4697.184.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0CD1D6E72F for ; Fri, 20 Nov 2015 09:03:06 -0800 (PST) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: "Tian, Kevin" Cc: "igvt-g@ml01.01.org" , "Reddy, Raghuveer" , "White, Michael L" , "Cowperthwaite, David J" , "intel-gfx@lists.freedesktop.org" , "Li, Susie" , "Dong, Eddie" , "linux-kernel@vger.kernel.org" , "xen-devel@lists.xen.org" , qemu-devel , "Zhou, Chao" , Paolo Bonzini , "Zhu, Libo" , "Wang, Hongbo" List-Id: intel-gfx@lists.freedesktop.org T24gRnJpLCAyMDE1LTExLTIwIGF0IDA3OjA5ICswMDAwLCBUaWFuLCBLZXZpbiB3cm90ZToKPiA+ IEZyb206IEFsZXggV2lsbGlhbXNvbiBbbWFpbHRvOmFsZXgud2lsbGlhbXNvbkByZWRoYXQuY29t XQo+ID4gU2VudDogRnJpZGF5LCBOb3ZlbWJlciAyMCwgMjAxNSA0OjAzIEFNCj4gPiAKPiA+ID4g Pgo+ID4gPiA+IFRoZSBwcm9wb3NhbCBpcyB0aGVyZWZvcmUgdGhhdCBHUFUgdmVuZG9ycyBjYW4g ZXhwb3NlIHZHUFVzIHRvCj4gPiA+ID4gdXNlcnNwYWNlLCBhbmQgdGh1cyB0byBRRU1VLCB1c2lu ZyB0aGUgVkZJTyBBUEkuICBGb3IgaW5zdGFuY2UsIHZmaW8KPiA+ID4gPiBzdXBwb3J0cyBtb2R1 bGFyIGJ1cyBkcml2ZXJzIGFuZCBJT01NVSBkcml2ZXJzLiAgQW4gaW50ZWwtdmZpby1ndnQtZAo+ ID4gPiA+IG1vZHVsZSAob3IgZXh0ZW5zaW9uIG9mIGk5MTUpIGNhbiByZWdpc3RlciBhcyBhIHZm aW8gYnVzIGRyaXZlciwgY3JlYXRlCj4gPiA+ID4gYSBzdHJ1Y3QgZGV2aWNlIHBlciB2R1BVLCBj cmVhdGUgYW4gSU9NTVUgZ3JvdXAgZm9yIHRoYXQgZGV2aWNlLCBhbmQKPiA+ID4gPiByZWdpc3Rl ciB0aGF0IGRldmljZSB3aXRoIHRoZSB2ZmlvLWNvcmUuICBTaW5jZSB3ZSBkb24ndCByZWx5IG9u IHRoZQo+ID4gPiA+IHN5c3RlbSBJT01NVSBmb3IgR1ZULWQgdkdQVSBhc3NpZ25tZW50LCBhbm90 aGVyIHZHUFUgdmVuZG9yIGRyaXZlciAob3IKPiA+ID4gPiBleHRlbnNpb24gb2YgdGhlIHNhbWUg bW9kdWxlKSBjYW4gcmVnaXN0ZXIgYSAidHlwZTEiIGNvbXBsaWFudCBJT01NVQo+ID4gPiA+IGRy aXZlciBpbnRvIHZmaW8tY29yZS4gIEZyb20gdGhlIHBlcnNwZWN0aXZlIG9mIFFFTVUgdGhlbiwg YWxsIG9mIHRoZQo+ID4gPiA+IGV4aXN0aW5nIHZmaW8tcGNpIGNvZGUgaXMgcmUtdXNlZCwgUUVN VSByZW1haW5zIGxhcmdlbHkgdW5hd2FyZSBvZiBhbnkKPiA+ID4gPiBzcGVjaWZpY3Mgb2YgdGhl IHZHUFUgYmVpbmcgYXNzaWduZWQsIGFuZCB0aGUgb25seSBuZWNlc3NhcnkgY2hhbmdlIHNvCj4g PiA+ID4gZmFyIGlzIGhvdyBRRU1VIHRyYXZlcnNlcyBzeXNmcyB0byBmaW5kIHRoZSBkZXZpY2Ug YW5kIHRodXMgdGhlIElPTU1VCj4gPiA+ID4gZ3JvdXAgbGVhZGluZyB0byB0aGUgdmZpbyBncm91 cC4KPiA+ID4KPiA+ID4gR1ZULWcgcmVxdWlyZXMgdG8gcGluIGd1ZXN0IG1lbW9yeSBhbmQgcXVl cnkgR1BBLT5IUEEgaW5mb3JtYXRpb24sCj4gPiA+IHVwb24gd2hpY2ggc2hhZG93IEdUVHMgd2ls bCBiZSB1cGRhdGVkIGFjY29yZGluZ2x5IGZyb20gKEdNQS0+R1BBKQo+ID4gPiB0byAoR01BLT5I UEEpLiBTbyB5ZXMsIGhlcmUgYSBkdW1teSBvciBzaW1wbGUgInR5cGUxIiBjb21wbGlhbnQgSU9N TVUKPiA+ID4gY2FuIGJlIGludHJvZHVjZWQganVzdCBmb3IgdGhpcyByZXF1aXJlbWVudC4KPiA+ ID4KPiA+ID4gSG93ZXZlciB0aGVyZSdzIG9uZSB0cmlja3kgcG9pbnQgd2hpY2ggSSdtIG5vdCBz dXJlIHdoZXRoZXIgb3ZlcmFsbAo+ID4gPiBWRklPIGNvbmNlcHQgd2lsbCBiZSB2aW9sYXRlZC4g R1ZULWcgZG9lc24ndCByZXF1aXJlIHN5c3RlbSBJT01NVQo+ID4gPiB0byBmdW5jdGlvbiwgaG93 ZXZlciBob3N0IHN5c3RlbSBtYXkgZW5hYmxlIHN5c3RlbSBJT01NVSBqdXN0IGZvcgo+ID4gPiBo YXJkZW5pbmcgcHVycG9zZS4gVGhpcyBtZWFucyB0d28tbGV2ZWwgdHJhbnNsYXRpb25zIGV4aXN0 aW5nIChHTUEtPgo+ID4gPiBJT1ZBLT5IUEEpLCBzbyB0aGUgZHVtbXkgSU9NTVUgZHJpdmVyIGhh cyB0byByZXF1ZXN0IHN5c3RlbSBJT01NVQo+ID4gPiBkcml2ZXIgdG8gYWxsb2NhdGUgSU9WQSBm b3IgVk1zIGFuZCB0aGVuIHNldHVwIElPVkEtPkhQQSBtYXBwaW5nCj4gPiA+IGluIElPTU1VIHBh Z2UgdGFibGUuIEluIHRoaXMgY2FzZSwgbXVsdGlwbGUgVk0ncyB0cmFuc2xhdGlvbnMgYXJlCj4g PiA+IG11bHRpcGxleGVkIGluIG9uZSBJT01NVSBwYWdlIHRhYmxlLgo+ID4gPgo+ID4gPiBXZSBt aWdodCBuZWVkIGNyZWF0ZSBzb21lIGdyb3VwL3N1Yi1ncm91cCBvciBwYXJlbnQvY2hpbGQgY29u Y2VwdHMKPiA+ID4gYW1vbmcgdGhvc2UgSU9NTVVzIGZvciB0aG9yb3VnaCBwZXJtaXNzaW9uIGNv bnRyb2wuCj4gPiAKPiA+IE15IHRob3VnaHQgaGVyZSBpcyB0aGF0IHRoaXMgaXMgYWxsIGFic3Ry YWN0ZWQgdGhyb3VnaCB0aGUgdkdQVSBJT01NVQo+ID4gYW5kIGRldmljZSB2ZmlvIGJhY2tlbmRz LiAgSXQncyB0aGUgR1BVIGRyaXZlciBpdHNlbGYsIG9yIHNvbWUgdmZpbwo+ID4gZXh0ZW5zaW9u IG9mIHRoYXQgZHJpdmVyLCBtZWRpYXRpbmcgYWNjZXNzIHRvIHRoZSBkZXZpY2UgYW5kIGRlY2lk aW5nCj4gPiB3aGVuIHRvIGNvbmZpZ3VyZSBHUFUgTU1VIG1hcHBpbmdzLiAgVGhhdCBkcml2ZXIg aGFzIGFjY2VzcyB0byB0aGUgR1BBCj4gPiB0byBIVkEgdHJhbnNsYXRpb25zIHRoYW5rcyB0byB0 aGUgdHlwZTEgY29tcGxhaW50IElPTU1VIGl0IGltcGxlbWVudHMKPiA+IGFuZCBjYW4gcGluIHBh Z2VzIGFzIG5lZWRlZCB0byBjcmVhdGUgR1BBIHRvIEhQQSBtYXBwaW5ncy4gIFRoYXQgc2hvdWxk Cj4gPiBnaXZlIGl0IGFsbCB0aGUgcGllY2VzIGl0IG5lZWRzIHRvIGZ1bGx5IHNldHVwIG1hcHBp bmdzIGZvciB0aGUgdkdQVS4KPiA+IFdoZXRoZXIgb3Igbm90IHRoZXJlJ3MgYSBzeXN0ZW0gSU9N TVUgaXMgc2ltcGx5IGFuIGV4ZXJjaXNlIGZvciB0aGF0Cj4gPiBkcml2ZXIuICBJdCBuZWVkcyB0 byBkbyBhIERNQSBtYXBwaW5nIG9wZXJhdGlvbiB0aHJvdWdoIHRoZSBzeXN0ZW0gSU9NTVUKPiA+ IHRoZSBzYW1lIGZvciBhIHZHUFUgYXMgaWYgaXQgd2FzIGRvaW5nIGl0IGZvciBpdHNlbGYsIGJl Y2F1c2UgdGhleSBhcmUKPiA+IGluIGZhY3Qgb25lIGluIHRoZSBzYW1lLiAgVGhlIEdNQSB0byBJ T1ZBIG1hcHBpbmcgc2VlbXMgbGlrZSBhbiBpbnRlcm5hbAo+ID4gZGV0YWlsLiAgSSBhc3N1bWUg dGhlIElPVkEgaXMgc29tZSBzb3J0IG9mIEdQQSwgYW5kIHRoZSBHTUEgaXMgbWFuYWdlZAo+ID4g dGhyb3VnaCBtZWRpYXRpb24gb2YgdGhlIGRldmljZS4KPiAKPiBTb3JyeSBJJ20gbm90IGZhbWls aWFyIHdpdGggVkZJTyBpbnRlcm5hbC4gTXkgb3JpZ2luYWwgd29ycnkgaXMgdGhhdCBzeXN0ZW0g Cj4gSU9NTVUgZm9yIEdQVSBtYXkgYmUgYWxyZWFkeSBjbGFpbWVkIGJ5IGFub3RoZXIgdmZpbyBk cml2ZXIgKGUuZy4gaG9zdCBrZXJuZWwKPiB3YW50cyB0byBoYXJkZW4gZ2Z4IGRyaXZlciBmcm9t IHJlc3Qgc3ViLXN5c3RlbXMsIHJlZ2FyZGxlc3Mgb2Ygd2hldGhlciB2R1BVIAo+IGlzIGNyZWF0 ZWQgb3Igbm90KS4gSW4gdGhhdCBjYXNlIHZHUFUgSU9NTVUgZHJpdmVyIHNob3VsZG4ndCBtYW5h Z2Ugc3lzdGVtCj4gSU9NTVUgZGlyZWN0bHkuCgpUaGVyZSBhcmUgZGlmZmVyZW50IEFQSXMgZm9y IHRoZSBJT01NVSBkZXBlbmRpbmcgb24gaG93IGl0J3MgYmVpbmcgdXNlLgpJZiB0aGUgSU9NTVUg aXMgYmVpbmcgdXNlZCBmb3IgaW50ZXItZGV2aWNlIGlzb2xhdGlvbiBpbiB0aGUgaG9zdCwgdGhl bgp0aGUgRE1BIEFQSSAoZXguIGRtYV9tYXBfcGFnZSkgdHJhbnNwYXJlbnRseSBtYWtlcyB1c2Ug b2YgdGhlIElPTU1VLgpXaGVuIHdlJ3JlIGRvaW5nIGRldmljZSBhc3NpZ25tZW50LCB3ZSBtYWtl IHVzZSBvZiB0aGUgSU9NTVUgQVBJIHdoaWNoCmFsbG93cyBtb3JlIGV4cGxpY2l0IGNvbnRyb2wg KGV4LiBpb21tdV9kb21haW5fYWxsb2MsCmlvbW11X2F0dGFjaF9kZXZpY2UsIGlvbW11X21hcCwg ZXRjKS4gIEEgdkdQVSBpcyBub3QgYW4gU1ItSU9WIFZGLCBpdApkb2Vzbid0IGhhdmUgYSB1bmlx dWUgcmVxdWVzdGVyIElEIHRoYXQgYWxsb3dzIHRoZSBJT01NVSB0bwpkaWZmZXJlbnRpYXRlIG9u ZSB2R1BVIGZyb20gYW5vdGhlciwgb3IgdkdQVSBmcm9tIEdQVS4gIEFsbCBtYXBwaW5ncyBmb3IK dkdQVXMgbmVlZCB0byBvY2N1ciBmb3IgdGhlIEdQVS4gIEl0J3MgdGhlcmVmb3JlIHRoZSByZXNw b25zaWJpbGl0eSBvZgp0aGUgR1BVIGRyaXZlciwgb3IgdGhpcyB2ZmlvIGV4dGVuc2lvbiBvZiB0 aGF0IGRyaXZlciwgdGhhdCBuZWVkcyB0bwpwZXJmb3JtIHRoZSBJT01NVSBtYXBwaW5nIGZvciB0 aGUgdkdQVS4KCk15IGV4cGVjdGF0aW9uIGlzIHRoZXJlZm9yZSB0aGF0IG9uY2UgdGhlIEdNQSB0 byBJT1ZBIG1hcHBpbmcgaXMKY29uZmlndXJlZCBpbiB0aGUgR1BVIE1NVSwgdGhlIElPVkEgdG8g SFBBIG5lZWRzIHRvIGJlIHByb2dyYW1tZWQsIGFzIGlmCnRoZSBHUFUgZHJpdmVyIHdhcyBwZXJm b3JtaW5nIHRoZSBzZXR1cCBpdHNlbGYsIHdoaWNoIGl0IGlzLiAgQmVmb3JlIHRoZQpkZXZpY2Ug bWVkaWF0aW9uIHRoYXQgdHJpZ2dlcmVkIHRoZSBtYXBwaW5nIHNldHVwIGlzIGNvbXBsZXRlLCB0 aGUgR1BVCk1NVSBhbmQgdGhlIHN5c3RlbSBJT01NVSAoaWYgcHJlc2V0KSBzaG91bGQgYmUgY29u ZmlndXJlZCB0byBlbmFibGUgdGhhdApETUEuICBUaGUgR1BVIE1NVSBwcm92aWRlcyB0aGUgaXNv bGF0aW9uIG9mIHRoZSB2R1BVLCB0aGUgc3lzdGVtIElPTU1VCmVuYWJsZSB0aGUgRE1BIHRvIG9j Y3VyLgoKPiBidHcsIGN1cmlvdXMgdG9kYXkgaG93IFZGSU8gY29vcmRpbmF0ZXMgd2l0aCBzeXN0 ZW0gSU9NTVUgZHJpdmVyIHJlZ2FyZGluZwo+IHRvIHdoZXRoZXIgYSBJT01NVSBpcyB1c2VkIHRv IGNvbnRyb2wgZGV2aWNlIGFzc2lnbm1lbnQsIG9yIHVzZWQgZm9yIGtlcm5lbCAKPiBoYXJkZW5p bmcuIFNvbWVob3cgdHdvIGFyZSBjb25mbGljdGluZyBzaW5jZSBkaWZmZXJlbnQgYWRkcmVzcyBz cGFjZXMgYXJlCj4gY29uY2VybmVkIChHUEEgdnMuIElPVkEpLi4uCgpXaGVuIGRldmljZXMgdW5i aW5kIGZyb20gbmF0aXZlIGhvc3QgZHJpdmVycywgYW55IHByZXZpb3VzIElPTU1VCm1hcHBpbmdz IGFuZCBkb21haW5zIGFyZSByZW1vdmVkLiAgVGhlc2UgYXJlIHR5cGljYWxseSBjcmVhdGVkIHZp YSB0aGUKRE1BIEFQSSBhYm92ZS4gIFRoZSBpbml0aWFsaXphdGlvbiBvcGVyYXRpb25zIG9mIHRo ZSBWRklPIEFQSSAoY3JlYXRpbmcKY29udGFpbmVycywgYXR0YWNoaW5nIGdyb3VwcyB0byBjb250 YWluZXJzLCBhbmQgc2V0dGluZyB0aGUgSU9NTVUgbW9kZWwKZm9yIGEgY29udGFpbmVyKSB3b3Jr IHRocm91Z2ggdGhlIElPTU1VIEFQSSB0byBjcmVhdGUgYSBuZXcgZG9tYWluIGFuZAppc29sYXRl IGRldmljZXMgd2l0aGluIGl0LiAgVGhlIHR5cGUxIFZGSU8gSU9NTVUgaW50ZXJmYWNlIGlzIHRo ZW4KZWZmZWN0aXZlbHkgYSBwYXNzdGhyb3VnaCB0byB0aGUgaW9tbXVfbWFwKCkgYW5kIGlvbW11 X3VubWFwKCkKaW50ZXJmYWNlcyBvZiB0aGUgSU9NTVUgQVBJLCBtb2R1bG8gcGFnZSBwaW5uaW5n LCBhY2NvdW50aW5nIGFuZAp0cmFja2luZy4gIFdoZW4gYSBWRklPIGluc3RhbmNlIGlzIGRlc3Ry b3llZCwgdGhlIGRldmljZXMgYXJlIGRldGFjaGVkCmZyb20gdGhlIElPTU1VIGRvbWFpbiwgdGhl IGRldmljZXMgYXJlIHVuYm91bmQgZnJvbSB2ZmlvIGFuZCByZS1ib3VuZCB0bwpob3N0IGRyaXZl cnMgYW5kIHRoZSBETUEgQVBJIGNhbiByZWNsYWltIHRoZSBkZXZpY2VzIGZvciBob3N0IGlzb2xh dGlvbi4KVGhhbmtzLAoKQWxleAoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX18KSW50ZWwtZ2Z4IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRl c2t0b3Aub3JnCmh0dHA6Ly9saXN0cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9p bnRlbC1nZngK From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163202AbbKTRDI (ORCPT ); Fri, 20 Nov 2015 12:03:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48656 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1162903AbbKTRDF (ORCPT ); Fri, 20 Nov 2015 12:03:05 -0500 Message-ID: <1448038984.4697.284.camel@redhat.com> Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel From: Alex Williamson To: "Tian, Kevin" Cc: "Song, Jike" , "xen-devel@lists.xen.org" , "igvt-g@ml01.01.org" , "intel-gfx@lists.freedesktop.org" , "linux-kernel@vger.kernel.org" , "White, Michael L" , "Dong, Eddie" , "Li, Susie" , "Cowperthwaite, David J" , "Reddy, Raghuveer" , "Zhu, Libo" , "Zhou, Chao" , "Wang, Hongbo" , "Lv, Zhiyuan" , qemu-devel , Paolo Bonzini , Gerd Hoffmann Date: Fri, 20 Nov 2015 10:03:04 -0700 In-Reply-To: References: <53D215D3.50608@intel.com> <547FCAAD.2060406@intel.com> <54AF967B.3060503@intel.com> <5527CEC4.9080700@intel.com> <559B3E38.1080707@intel.com> <562F4311.9@intel.com> <1447870341.4697.92.camel@redhat.com> <1447963356.4697.184.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2015-11-20 at 07:09 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Friday, November 20, 2015 4:03 AM > > > > > > > > > > The proposal is therefore that GPU vendors can expose vGPUs to > > > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio > > > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d > > > > module (or extension of i915) can register as a vfio bus driver, create > > > > a struct device per vGPU, create an IOMMU group for that device, and > > > > register that device with the vfio-core. Since we don't rely on the > > > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or > > > > extension of the same module) can register a "type1" compliant IOMMU > > > > driver into vfio-core. From the perspective of QEMU then, all of the > > > > existing vfio-pci code is re-used, QEMU remains largely unaware of any > > > > specifics of the vGPU being assigned, and the only necessary change so > > > > far is how QEMU traverses sysfs to find the device and thus the IOMMU > > > > group leading to the vfio group. > > > > > > GVT-g requires to pin guest memory and query GPA->HPA information, > > > upon which shadow GTTs will be updated accordingly from (GMA->GPA) > > > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU > > > can be introduced just for this requirement. > > > > > > However there's one tricky point which I'm not sure whether overall > > > VFIO concept will be violated. GVT-g doesn't require system IOMMU > > > to function, however host system may enable system IOMMU just for > > > hardening purpose. This means two-level translations existing (GMA-> > > > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU > > > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping > > > in IOMMU page table. In this case, multiple VM's translations are > > > multiplexed in one IOMMU page table. > > > > > > We might need create some group/sub-group or parent/child concepts > > > among those IOMMUs for thorough permission control. > > > > My thought here is that this is all abstracted through the vGPU IOMMU > > and device vfio backends. It's the GPU driver itself, or some vfio > > extension of that driver, mediating access to the device and deciding > > when to configure GPU MMU mappings. That driver has access to the GPA > > to HVA translations thanks to the type1 complaint IOMMU it implements > > and can pin pages as needed to create GPA to HPA mappings. That should > > give it all the pieces it needs to fully setup mappings for the vGPU. > > Whether or not there's a system IOMMU is simply an exercise for that > > driver. It needs to do a DMA mapping operation through the system IOMMU > > the same for a vGPU as if it was doing it for itself, because they are > > in fact one in the same. The GMA to IOVA mapping seems like an internal > > detail. I assume the IOVA is some sort of GPA, and the GMA is managed > > through mediation of the device. > > Sorry I'm not familiar with VFIO internal. My original worry is that system > IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel > wants to harden gfx driver from rest sub-systems, regardless of whether vGPU > is created or not). In that case vGPU IOMMU driver shouldn't manage system > IOMMU directly. There are different APIs for the IOMMU depending on how it's being use. If the IOMMU is being used for inter-device isolation in the host, then the DMA API (ex. dma_map_page) transparently makes use of the IOMMU. When we're doing device assignment, we make use of the IOMMU API which allows more explicit control (ex. iommu_domain_alloc, iommu_attach_device, iommu_map, etc). A vGPU is not an SR-IOV VF, it doesn't have a unique requester ID that allows the IOMMU to differentiate one vGPU from another, or vGPU from GPU. All mappings for vGPUs need to occur for the GPU. It's therefore the responsibility of the GPU driver, or this vfio extension of that driver, that needs to perform the IOMMU mapping for the vGPU. My expectation is therefore that once the GMA to IOVA mapping is configured in the GPU MMU, the IOVA to HPA needs to be programmed, as if the GPU driver was performing the setup itself, which it is. Before the device mediation that triggered the mapping setup is complete, the GPU MMU and the system IOMMU (if preset) should be configured to enable that DMA. The GPU MMU provides the isolation of the vGPU, the system IOMMU enable the DMA to occur. > btw, curious today how VFIO coordinates with system IOMMU driver regarding > to whether a IOMMU is used to control device assignment, or used for kernel > hardening. Somehow two are conflicting since different address spaces are > concerned (GPA vs. IOVA)... When devices unbind from native host drivers, any previous IOMMU mappings and domains are removed. These are typically created via the DMA API above. The initialization operations of the VFIO API (creating containers, attaching groups to containers, and setting the IOMMU model for a container) work through the IOMMU API to create a new domain and isolate devices within it. The type1 VFIO IOMMU interface is then effectively a passthrough to the iommu_map() and iommu_unmap() interfaces of the IOMMU API, modulo page pinning, accounting and tracking. When a VFIO instance is destroyed, the devices are detached from the IOMMU domain, the devices are unbound from vfio and re-bound to host drivers and the DMA API can reclaim the devices for host isolation. Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40448) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zzp5c-0004f7-2r for qemu-devel@nongnu.org; Fri, 20 Nov 2015 12:03:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zzp5W-0002CD-ND for qemu-devel@nongnu.org; Fri, 20 Nov 2015 12:03:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46018) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zzp5W-0002C9-FS for qemu-devel@nongnu.org; Fri, 20 Nov 2015 12:03:06 -0500 Message-ID: <1448038984.4697.284.camel@redhat.com> From: Alex Williamson Date: Fri, 20 Nov 2015 10:03:04 -0700 In-Reply-To: References: <53D215D3.50608@intel.com> <547FCAAD.2060406@intel.com> <54AF967B.3060503@intel.com> <5527CEC4.9080700@intel.com> <559B3E38.1080707@intel.com> <562F4311.9@intel.com> <1447870341.4697.92.camel@redhat.com> <1447963356.4697.184.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Tian, Kevin" Cc: "igvt-g@ml01.01.org" , "Song, Jike" , "Reddy, Raghuveer" , qemu-devel , "White, Michael L" , "Cowperthwaite, David J" , "intel-gfx@lists.freedesktop.org" , "Li, Susie" , "Dong, Eddie" , "linux-kernel@vger.kernel.org" , "xen-devel@lists.xen.org" , Gerd Hoffmann , "Zhou, Chao" , Paolo Bonzini , "Zhu, Libo" , "Wang, Hongbo" , "Lv, Zhiyuan" On Fri, 2015-11-20 at 07:09 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Friday, November 20, 2015 4:03 AM > > > > > > > > > > The proposal is therefore that GPU vendors can expose vGPUs to > > > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio > > > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d > > > > module (or extension of i915) can register as a vfio bus driver, create > > > > a struct device per vGPU, create an IOMMU group for that device, and > > > > register that device with the vfio-core. Since we don't rely on the > > > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or > > > > extension of the same module) can register a "type1" compliant IOMMU > > > > driver into vfio-core. From the perspective of QEMU then, all of the > > > > existing vfio-pci code is re-used, QEMU remains largely unaware of any > > > > specifics of the vGPU being assigned, and the only necessary change so > > > > far is how QEMU traverses sysfs to find the device and thus the IOMMU > > > > group leading to the vfio group. > > > > > > GVT-g requires to pin guest memory and query GPA->HPA information, > > > upon which shadow GTTs will be updated accordingly from (GMA->GPA) > > > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU > > > can be introduced just for this requirement. > > > > > > However there's one tricky point which I'm not sure whether overall > > > VFIO concept will be violated. GVT-g doesn't require system IOMMU > > > to function, however host system may enable system IOMMU just for > > > hardening purpose. This means two-level translations existing (GMA-> > > > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU > > > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping > > > in IOMMU page table. In this case, multiple VM's translations are > > > multiplexed in one IOMMU page table. > > > > > > We might need create some group/sub-group or parent/child concepts > > > among those IOMMUs for thorough permission control. > > > > My thought here is that this is all abstracted through the vGPU IOMMU > > and device vfio backends. It's the GPU driver itself, or some vfio > > extension of that driver, mediating access to the device and deciding > > when to configure GPU MMU mappings. That driver has access to the GPA > > to HVA translations thanks to the type1 complaint IOMMU it implements > > and can pin pages as needed to create GPA to HPA mappings. That should > > give it all the pieces it needs to fully setup mappings for the vGPU. > > Whether or not there's a system IOMMU is simply an exercise for that > > driver. It needs to do a DMA mapping operation through the system IOMMU > > the same for a vGPU as if it was doing it for itself, because they are > > in fact one in the same. The GMA to IOVA mapping seems like an internal > > detail. I assume the IOVA is some sort of GPA, and the GMA is managed > > through mediation of the device. > > Sorry I'm not familiar with VFIO internal. My original worry is that system > IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel > wants to harden gfx driver from rest sub-systems, regardless of whether vGPU > is created or not). In that case vGPU IOMMU driver shouldn't manage system > IOMMU directly. There are different APIs for the IOMMU depending on how it's being use. If the IOMMU is being used for inter-device isolation in the host, then the DMA API (ex. dma_map_page) transparently makes use of the IOMMU. When we're doing device assignment, we make use of the IOMMU API which allows more explicit control (ex. iommu_domain_alloc, iommu_attach_device, iommu_map, etc). A vGPU is not an SR-IOV VF, it doesn't have a unique requester ID that allows the IOMMU to differentiate one vGPU from another, or vGPU from GPU. All mappings for vGPUs need to occur for the GPU. It's therefore the responsibility of the GPU driver, or this vfio extension of that driver, that needs to perform the IOMMU mapping for the vGPU. My expectation is therefore that once the GMA to IOVA mapping is configured in the GPU MMU, the IOVA to HPA needs to be programmed, as if the GPU driver was performing the setup itself, which it is. Before the device mediation that triggered the mapping setup is complete, the GPU MMU and the system IOMMU (if preset) should be configured to enable that DMA. The GPU MMU provides the isolation of the vGPU, the system IOMMU enable the DMA to occur. > btw, curious today how VFIO coordinates with system IOMMU driver regarding > to whether a IOMMU is used to control device assignment, or used for kernel > hardening. Somehow two are conflicting since different address spaces are > concerned (GPA vs. IOVA)... When devices unbind from native host drivers, any previous IOMMU mappings and domains are removed. These are typically created via the DMA API above. The initialization operations of the VFIO API (creating containers, attaching groups to containers, and setting the IOMMU model for a container) work through the IOMMU API to create a new domain and isolate devices within it. The type1 VFIO IOMMU interface is then effectively a passthrough to the iommu_map() and iommu_unmap() interfaces of the IOMMU API, modulo page pinning, accounting and tracking. When a VFIO instance is destroyed, the devices are detached from the IOMMU domain, the devices are unbound from vfio and re-bound to host drivers and the DMA API can reclaim the devices for host isolation. Thanks, Alex