From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address Date: Sun, 30 Dec 2018 13:30:15 -0500 Message-ID: <20181230132614-mutt-send-email-mst@kernel.org> References: <20181213102713-mutt-send-email-mst@kernel.org> <20181214073332-mutt-send-email-mst@kernel.org> <2ea274df-a79a-250f-648f-12927529d78a@redhat.com> <20181224125237-mutt-send-email-mst@kernel.org> <20181225071501-mutt-send-email-mst@kernel.org> <70978ed8-bf76-693a-0e11-d31b6234af5c@redhat.com> <20181226092431-mutt-send-email-mst@kernel.org> <8ef53a5c-ad4e-fadd-b460-18b3e589ead9@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <8ef53a5c-ad4e-fadd-b460-18b3e589ead9@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Jason Wang Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org List-Id: virtualization@lists.linuxfoundation.org T24gVGh1LCBEZWMgMjcsIDIwMTggYXQgMDU6Mzk6MjFQTSArMDgwMCwgSmFzb24gV2FuZyB3cm90 ZToKPiAKPiBPbiAyMDE4LzEyLzI2IOS4i+WNiDExOjAyLCBNaWNoYWVsIFMuIFRzaXJraW4gd3Jv dGU6Cj4gPiBPbiBXZWQsIERlYyAyNiwgMjAxOCBhdCAxMTo1NzozMkFNICswODAwLCBKYXNvbiBX YW5nIHdyb3RlOgo+ID4gPiBPbiAyMDE4LzEyLzI1IOS4i+WNiDg6NTAsIE1pY2hhZWwgUy4gVHNp cmtpbiB3cm90ZToKPiA+ID4gPiBPbiBUdWUsIERlYyAyNSwgMjAxOCBhdCAwNjowNToyNVBNICsw ODAwLCBKYXNvbiBXYW5nIHdyb3RlOgo+ID4gPiA+ID4gT24gMjAxOC8xMi8yNSDkuIrljYgyOjEw LCBNaWNoYWVsIFMuIFRzaXJraW4gd3JvdGU6Cj4gPiA+ID4gPiA+IE9uIE1vbiwgRGVjIDI0LCAy MDE4IGF0IDAzOjUzOjE2UE0gKzA4MDAsIEphc29uIFdhbmcgd3JvdGU6Cj4gPiA+ID4gPiA+ID4g T24gMjAxOC8xMi8xNCDkuIvljYg4OjM2LCBNaWNoYWVsIFMuIFRzaXJraW4gd3JvdGU6Cj4gPiA+ ID4gPiA+ID4gPiBPbiBGcmksIERlYyAxNCwgMjAxOCBhdCAxMTo1NzozNUFNICswODAwLCBKYXNv biBXYW5nIHdyb3RlOgo+ID4gPiA+ID4gPiA+ID4gPiBPbiAyMDE4LzEyLzEzIOS4i+WNiDExOjQ0 LCBNaWNoYWVsIFMuIFRzaXJraW4gd3JvdGU6Cj4gPiA+ID4gPiA+ID4gPiA+ID4gT24gVGh1LCBE ZWMgMTMsIDIwMTggYXQgMDY6MTA6MjJQTSArMDgwMCwgSmFzb24gV2FuZyB3cm90ZToKPiA+ID4g PiA+ID4gPiA+ID4gPiA+IEl0IHdhcyBub3RpY2VkIHRoYXQgdGhlIGNvcHlfdXNlcigpIGZyaWVu ZHMgdGhhdCB3YXMgdXNlZCB0byBhY2Nlc3MKPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHZpcnRxdWV1 ZSBtZXRkYXRhIHRlbmRzIHRvIGJlIHZlcnkgZXhwZW5zaXZlIGZvciBkYXRhcGxhbmUKPiA+ID4g PiA+ID4gPiA+ID4gPiA+IGltcGxlbWVudGF0aW9uIGxpa2Ugdmhvc3Qgc2luY2UgaXQgaW52b2x2 ZXMgbG90cyBvZiBzb2Z0d2FyZSBjaGVjaywKPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHNwZWN1bGF0 aW9uIGJhcnJpZXIsIGhhcmR3YXJlIGZlYXR1cmUgdG9nZ2xpbmcgKGUuZyBTTUFQKS4gVGhlCj4g PiA+ID4gPiA+ID4gPiA+ID4gPiBleHRyYSBjb3N0IHdpbGwgYmUgbW9yZSBvYnZpb3VzIHdoZW4g dHJhbnNmZXJyaW5nIHNtYWxsIHBhY2tldHMuCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAKPiA+ID4g PiA+ID4gPiA+ID4gPiA+IFRoaXMgcGF0Y2ggdHJpZXMgdG8gZWxpbWluYXRlIHRob3NlIG92ZXJo ZWFkIGJ5IHBpbiB2cSBtZXRhZGF0YSBwYWdlcwo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gYW5kIGFj Y2VzcyB0aGVtIHRocm91Z2ggdm1hcCgpLiBEdXJpbmcgU0VUX1ZSSU5HX0FERFIsIHdlIHdpbGwg c2V0dXAKPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHRob3NlIG1hcHBpbmdzIGFuZCBtZW1vcnkgYWNj ZXNzb3JzIGFyZSBtb2RpZmllZCB0byB1c2UgcG9pbnRlcnMgdG8KPiA+ID4gPiA+ID4gPiA+ID4g PiA+IGFjY2VzcyB0aGUgbWV0YWRhdGEgZGlyZWN0bHkuCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAK PiA+ID4gPiA+ID4gPiA+ID4gPiA+IE5vdGUsIHRoaXMgd2FzIG9ubHkgZG9uZSB3aGVuIGRldmlj ZSBJT1RMQiBpcyBub3QgZW5hYmxlZC4gV2UgY291bGQKPiA+ID4gPiA+ID4gPiA+ID4gPiA+IHVz ZSBzaW1pbGFyIG1ldGhvZCB0byBvcHRpbWl6ZSBpdCBpbiB0aGUgZnV0dXJlLgo+ID4gPiA+ID4g PiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBUZXN0cyBzaG93cyBhYm91dCB+MjQl IGltcHJvdmVtZW50IG9uIFRYIFBQUyB3aGVuIHVzaW5nIHZpcnRpby11c2VyICsKPiA+ID4gPiA+ ID4gPiA+ID4gPiA+IHZob3N0X25ldCArIHhkcDEgb24gVEFQIChDT05GSUdfSEFSREVORURfVVNF UkNPUFkgaXMgbm90IGVuYWJsZWQpOgo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiBCZWZvcmU6IH41LjBNcHBzCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBBZnRlcjog IH42LjFNcHBzCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAKPiA+ID4gPiA+ID4gPiA+ID4gPiA+IFNp Z25lZC1vZmYtYnk6IEphc29uIFdhbmc8amFzb3dhbmdAcmVkaGF0LmNvbT4KPiA+ID4gPiA+ID4g PiA+ID4gPiA+IC0tLQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gICAgICAgZHJpdmVycy92aG9zdC92 aG9zdC5jIHwgMTc4ICsrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKwo+ ID4gPiA+ID4gPiA+ID4gPiA+ID4gICAgICAgZHJpdmVycy92aG9zdC92aG9zdC5oIHwgIDExICsr Kwo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gICAgICAgMiBmaWxlcyBjaGFuZ2VkLCAxODkgaW5zZXJ0 aW9ucygrKQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiBkaWZm IC0tZ2l0IGEvZHJpdmVycy92aG9zdC92aG9zdC5jIGIvZHJpdmVycy92aG9zdC92aG9zdC5jCj4g PiA+ID4gPiA+ID4gPiA+ID4gPiBpbmRleCBiYWZlMzlkMmU2MzcuLjFiZDI0MjAzYWZiNiAxMDA2 NDQKPiA+ID4gPiA+ID4gPiA+ID4gPiA+IC0tLSBhL2RyaXZlcnMvdmhvc3Qvdmhvc3QuYwo+ID4g PiA+ID4gPiA+ID4gPiA+ID4gKysrIGIvZHJpdmVycy92aG9zdC92aG9zdC5jCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiBAQCAtNDQzLDYgKzQ0Myw5IEBAIHZvaWQgdmhvc3RfZGV2X2luaXQoc3RydWN0 IHZob3N0X2RldiAqZGV2LAo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gICAgICAgCQl2cS0+aW5kaXJl Y3QgPSBOVUxMOwo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gICAgICAgCQl2cS0+aGVhZHMgPSBOVUxM Owo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gICAgICAgCQl2cS0+ZGV2ID0gZGV2Owo+ID4gPiA+ID4g PiA+ID4gPiA+ID4gKwkJbWVtc2V0KCZ2cS0+YXZhaWxfcmluZywgMCwgc2l6ZW9mKHZxLT5hdmFp bF9yaW5nKSk7Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCQltZW1zZXQoJnZxLT51c2VkX3Jpbmcs IDAsIHNpemVvZih2cS0+dXNlZF9yaW5nKSk7Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCQltZW1z ZXQoJnZxLT5kZXNjX3JpbmcsIDAsIHNpemVvZih2cS0+ZGVzY19yaW5nKSk7Cj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiAgICAgICAJCW11dGV4X2luaXQoJnZxLT5tdXRleCk7Cj4gPiA+ID4gPiA+ID4g PiA+ID4gPiAgICAgICAJCXZob3N0X3ZxX3Jlc2V0KGRldiwgdnEpOwo+ID4gPiA+ID4gPiA+ID4g PiA+ID4gICAgICAgCQlpZiAodnEtPmhhbmRsZV9raWNrKQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4g QEAgLTYxNCw2ICs2MTcsMTAyIEBAIHN0YXRpYyB2b2lkIHZob3N0X2NsZWFyX21zZyhzdHJ1Y3Qg dmhvc3RfZGV2ICpkZXYpCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAgICAgICAJc3Bpbl91bmxvY2so JmRldi0+aW90bGJfbG9jayk7Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAgICAgICB9Cj4gPiA+ID4g PiA+ID4gPiA+ID4gPiArc3RhdGljIGludCB2aG9zdF9pbml0X3ZtYXAoc3RydWN0IHZob3N0X3Zt YXAgKm1hcCwgdW5zaWduZWQgbG9uZyB1YWRkciwKPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJCQkg ICBzaXplX3Qgc2l6ZSwgaW50IHdyaXRlKQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gK3sKPiA+ID4g PiA+ID4gPiA+ID4gPiA+ICsJc3RydWN0IHBhZ2UgKipwYWdlczsKPiA+ID4gPiA+ID4gPiA+ID4g PiA+ICsJaW50IG5wYWdlcyA9IERJVl9ST1VORF9VUChzaXplLCBQQUdFX1NJWkUpOwo+ID4gPiA+ ID4gPiA+ID4gPiA+ID4gKwlpbnQgbnBpbm5lZDsKPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJdm9p ZCAqdmFkZHI7Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCj4gPiA+ID4gPiA+ID4gPiA+ID4gPiAr CXBhZ2VzID0ga21hbGxvY19hcnJheShucGFnZXMsIHNpemVvZihzdHJ1Y3QgcGFnZSAqKSwgR0ZQ X0tFUk5FTCk7Cj4gPiA+ID4gPiA+ID4gPiA+ID4gPiArCWlmICghcGFnZXMpCj4gPiA+ID4gPiA+ ID4gPiA+ID4gPiArCQlyZXR1cm4gLUVOT01FTTsKPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsKPiA+ ID4gPiA+ID4gPiA+ID4gPiA+ICsJbnBpbm5lZCA9IGdldF91c2VyX3BhZ2VzX2Zhc3QodWFkZHIs IG5wYWdlcywgd3JpdGUsIHBhZ2VzKTsKPiA+ID4gPiA+ID4gPiA+ID4gPiA+ICsJaWYgKG5waW5u ZWQgIT0gbnBhZ2VzKQo+ID4gPiA+ID4gPiA+ID4gPiA+ID4gKwkJZ290byBlcnI7Cj4gPiA+ID4g PiA+ID4gPiA+ID4gPiArCj4gPiA+ID4gPiA+ID4gPiA+ID4gQXMgSSBzYWlkIEkgaGF2ZSBkb3Vi dHMgYWJvdXQgdGhlIHdob2xlIGFwcHJvYWNoLCBidXQgdGhpcwo+ID4gPiA+ID4gPiA+ID4gPiA+ IGltcGxlbWVudGF0aW9uIGluIHBhcnRpY3VsYXIgaXNuJ3QgYSBnb29kIGlkZWEKPiA+ID4gPiA+ ID4gPiA+ID4gPiBhcyBpdCBrZWVwcyB0aGUgcGFnZSBhcm91bmQgZm9yZXZlci4KPiA+ID4gPiA+ ID4gPiBUaGUgcGFnZXMgd2lsIGJlIHJlbGVhc2VkIGR1cmluZyBzZXQgZmVhdHVyZXMuCj4gPiA+ ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gPiA+ID4gU28gbm8gVEhQLCBu byBOVU1BIHJlYmFsYW5jaW5nLAo+ID4gPiA+ID4gPiA+IEZvciBUSFAsIHdlIHdpbGwgcHJvYmFi bHkgbWlzcyAyIG9yIDQgcGFnZXMsIGJ1dCBkb2VzIHRoaXMgcmVhbGx5IG1hdHRlcgo+ID4gPiA+ ID4gPiA+IGNvbnNpZGVyIHRoZSBnYWluIHdlIGhhdmU/Cj4gPiA+ID4gPiA+IFdlIGFzIGluIHZo b3N0PyBuZXR3b3JraW5nIGlzbid0IHRoZSBvbmx5IHRoaW5nIGd1ZXN0IGRvZXMuCj4gPiA+ID4g PiA+IFdlIGRvbid0IGV2ZW4ga25vdyBpZiB0aGlzIGd1ZXN0IGRvZXMgYSBsb3Qgb2YgbmV0d29y a2luZy4KPiA+ID4gPiA+ID4gWW91IGRvbid0Cj4gPiA+ID4gPiA+IGtub3cgd2hhdCBlbHNlIGlz IGluIHRoaXMgaHVnZSBwYWdlLiBDYW4gYmUgc29tZXRoaW5nIHZlcnkgaW1wb3J0YW50Cj4gPiA+ ID4gPiA+IHRoYXQgZ3Vlc3QgdG91Y2hlcyBhbGwgdGhlIHRpbWUuCj4gPiA+ID4gPiBXZWxsLCB0 aGUgcHJvYmFiaWxpdHkgc2hvdWxkIGJlIHZlcnkgc21hbGwgY29uc2lkZXIgd2UgdXN1YWxseSBn aXZlIHNldmVyYWwKPiA+ID4gPiA+IGdpZ2FieXRlcyB0byBndWVzdC4gVGhlIHJlc3Qgb2YgdGhl IHBhZ2VzIHRoYXQgZG9lc24ndCBzaXQgaW4gdGhlIHNhbWUKPiA+ID4gPiA+IGh1Z2VwYWdlIHdp dGggbWV0YWRhdGEgY2FuIHN0aWxsIGJlIG1lcmdlZCBieSBUSFAuwqAgQW55d2F5LCBJIGNhbiB0 ZXN0IHRoZQo+ID4gPiA+ID4gZGlmZmVyZW5jZXMuCj4gPiA+ID4gVGhhbmtzIQo+ID4gPiA+IAo+ ID4gPiA+ID4gPiA+IEZvciBOVU1BIHJlYmFsYW5jaW5nLCBJJ20gZXZlbiBub3QgcXVpdGUgc3Vy ZSBpZgo+ID4gPiA+ID4gPiA+IGl0IGNhbiBoZWxwcyBmb3IgdGhlIGNhc2Ugb2YgSVBDICh2aG9z dCkuIEl0IGxvb2tzIHRvIG1lIHRoZSB3b3JzdCBjYXNlIGl0Cj4gPiA+ID4gPiA+ID4gbWF5IGNh dXNlIHBhZ2UgdG8gYmUgdGhyYXNoIGJldHdlZW4gbm9kZXMgaWYgdmhvc3QgYW5kIHVzZXJzcGFj ZSBhcmUgcnVubmluZwo+ID4gPiA+ID4gPiA+IGluIHR3byBub2Rlcy4KPiA+ID4gPiA+ID4gU28g YWdhaW4gaXQncyBhIGdhaW4gZm9yIHZob3N0IGJ1dCBoYXMgYSBjb21wbGV0ZWx5IHVucHJlZGlj dGFibGUgZWZmZWN0IG9uCj4gPiA+ID4gPiA+IG90aGVyIGZ1bmN0aW9uYWxpdHkgb2YgdGhlIGd1 ZXN0Lgo+ID4gPiA+ID4gPiAKPiA+ID4gPiA+ID4gVGhhdCdzIHdoYXQgYm90aGVycyBtZSB3aXRo IHRoaXMgYXBwcm9hY2guCj4gPiA+ID4gPiBTbzoKPiA+ID4gPiA+IAo+ID4gPiA+ID4gLSBUaGUg cmVzdCBvZiB0aGUgcGFnZXMgY291bGQgc3RpbGwgYmUgYmFsYW5jZWQgdG8gb3RoZXIgbm9kZXMs IG5vPwo+ID4gPiA+ID4gCj4gPiA+ID4gPiAtIHRyeSB0byBiYWxhbmNlIG1ldGFkYXRhIHBhZ2Vz IChiZWxvbmdzIHRvIGNvLW9wZXJhdGUgcHJvY2Vzc2VzKSBpdHNlbGYgaXMKPiA+ID4gPiA+IHN0 aWxsIHF1ZXN0aW9uYWJsZQo+ID4gPiA+IEkgYW0gbm90IHN1cmUgd2h5LiBJdCBzaG91bGQgYmUg ZWFzeSBlbm91Z2ggdG8gZm9yY2UgdGhlIFZDUFUgYW5kIHZob3N0Cj4gPiA+ID4gdG8gbW92ZSAo ZS5nLiBzdGFydCB0aGVtIHBpbm5lZCB0byAxIGNwdSwgdGhlbiBwaW4gdGhlbSB0byBhbm90aGVy IG9uZSkuCj4gPiA+ID4gQ2xlYXJseSBzb21ldGltZXMgdGhpcyB3b3VsZCBiZSBuZWNlc3Nhcnkg Zm9yIGxvYWQgYmFsYW5jaW5nIHJlYXNvbnMuCj4gPiA+IAo+ID4gPiBZZXMsIGJ1dCBpdCBsb29r cyB0byBtZSB0aGUgcGFydCBvZiBtb3RpdmF0aW9uIG9mIGF1dG8gTlVNQSBpcyB0byBhdm9pZAo+ ID4gPiBtYW51YWwgcGlubmluZy4KPiA+IC4uLiBvZiBtZW1vcnkuIFllcy4KPiA+IAo+ID4gCj4g PiA+ID4gV2l0aCBhdXRvbnVtYSBhZnRlciBhIHdoaWxlIChjb3VsZCB0YWtlIHNlY29uZHMgYnV0 IGl0IHdpbGwgaGFwcGVuKSB0aGUKPiA+ID4gPiBtZW1vcnkgd2lsbCBtaWdyYXRlLgo+ID4gPiA+ IAo+ID4gPiBZZXMuIEFzIHlvdSBtZW50aW9uZWQgZHVyaW5nIHRoZSBkaXNjdXNzLCBJIHdvbmRl ciB3ZSBjb3VsZCBkbyBpdCBzaW1pbGFybHkKPiA+ID4gdGhyb3VnaCBtbXUgbm90aWZpZXIgbGlr ZSBBUElDIGFjY2VzcyBwYWdlIGluIGNvbW1pdCBjMjRhZTBkY2QzZSAoImt2bTogeDg2Ogo+ID4g PiBVbnBpbiBhbmQgcmVtb3ZlIGt2bV9hcmNoLT5hcGljX2FjY2Vzc19wYWdlIikKPiA+IFRoYXQg d291bGQgYmUgYSBwb3NzaWJsZSBhcHByb2FjaC4KPiAKPiAKPiBZZXMsIHRoaXMgbG9va3MgcG9z c2libGUsIGFuZCB0aGUgY29udmVyc2lvbiBzZWVtcyBub3QgaGFyZC4gTGV0IG1lIGhhdmUgYQo+ IHRyeSB3aXRoIHRoaXMuCj4gCj4gCj4gWy4uLl0KPiAKPiAKPiA+ID4gPiA+ID4gPiA+IEkgZG9u J3Qgc2VlIGhvdyBhIGt0aHJlYWQgbWFrZXMgYW55IGRpZmZlcmVuY2UuIFdlIGRvIGhhdmUgYSB2 YWxpZGF0aW9uCj4gPiA+ID4gPiA+ID4gPiBzdGVwIHdoaWNoIG1ha2VzIHNvbWUgZGlmZmVyZW5j ZS4KPiA+ID4gPiA+ID4gPiBUaGUgcHJvYmxlbSBpcyBub3Qga3RocmVhZCBidXQgdGhlIGFkZHJl c3Mgb2YgdXNlcnNwYWNlIGFkZHJlc3MuIFRoZQo+ID4gPiA+ID4gPiA+IGFkZHJlc3NlcyBvZiB2 cSBtZXRhZGF0YSB0ZW5kcyB0byBiZSBjb25zaXN0ZW50IGZvciBhIHdoaWxlLCBhbmQgdmhvc3Qg a25vd3MKPiA+ID4gPiA+ID4gPiB0aGV5IHdpbGwgYmUgZnJlcXVlbnRseS4gU01BUCBkb2Vzbid0 IGhlbHAgdG9vIG11Y2ggaW4gdGhpcyBjYXNlLgo+ID4gPiA+ID4gPiA+IAo+ID4gPiA+ID4gPiA+ IFRoYW5rcy4KPiA+ID4gPiA+ID4gSXQncyB0cnVlIGZvciBhIHJlYWwgbGlmZSBhcHBsaWNhdGlv bnMgYnV0IGEgbWFsaWNpb3VzIG9uZQo+ID4gPiA+ID4gPiBjYW4gY2FsbCB0aGUgc2V0dXAgaW9j dGxzIGFueSBudW1iZXIgb2YgdGltZXMuIEFuZCBTTUFQIGlzCj4gPiA+ID4gPiA+IGFsbCBhYm91 dCBtYWxjaW91cyBhcHBsaWNhdGlvbnMuCj4gPiA+ID4gPiBXZSBkb24ndCBkbyB0aGlzIGluIHRo ZSBwYXRoIG9mIGlvY3RsLCB0aGVyZSdzIG5vIGNvbnRleHQgc3dpdGNoIGJldHdlZW4KPiA+ID4g PiA+IHVzZXJzcGFjZSBhbmQga2VybmVsIGluIHRoZSB3b3JrZXIgdGhyZWFkLiBTTUFQIGlzIHVz ZWQgdG8gcHJldmVudCBrZXJuZWwKPiA+ID4gPiA+IGZyb20gYWNjZXNzaW5nIHVzZXJzcGFjZSBw YWdlcyB1bmV4cGVjdGVkbHkgd2hpY2ggaXMgbm90IHRoZSBjYXNlIGZvcgo+ID4gPiA+ID4gbWV0 YWRhdGEgYWNjZXNzLgo+ID4gPiA+ID4gCj4gPiA+ID4gPiBUaGFua3MKPiA+ID4gPiBPSyBsZXQn cyBmb3JnZXQgc21hcCBmb3Igbm93Lgo+ID4gPiAKPiA+ID4gU29tZSBudW1iZXJzIEkgbWVhc3Vy ZWQ6Cj4gPiA+IAo+ID4gPiBPbiBhbiBvbGQgU2FuZHkgYnJpZGdlIG1hY2hpbmUgd2l0aG91dCBT TUFQIHN1cHBvcnQuIFJlbW92ZSBzcGVjdWxhdGlvbgo+ID4gPiBiYXJyaWVyIGJvb3N0IHRoZSBw ZXJmb3JtYW5jZSBmcm9tIDQuNk1wcHMgdG8gNS4xTXBwcwo+ID4gPiAKPiA+ID4gT24gYSBuZXdl ciBCcm9hZHdlbGwgbWFjaGluZSB3aXRoIFNNQVAgc3VwcG9ydC4gUmVtb3ZlIHNwZWN1bGF0aW9u IGJhcnJpZXIKPiA+ID4gb25seSBnaXZlcyAyJS01JSBpbXByb3ZlbWVudCwgZGlzYWJsZSBTTUFQ IGNvbXBsZXRlbHkgdGhyb3VnaCBLY29uZmlnIGJvb3N0Cj4gPiA+IDU3JSBwZXJmb3JtYW5jZSBm cm9tIDQuOE1wcHMgdG8gNy41TXBwcy4gKFZtYXAgZ2l2ZXMgNk1wcHMgLSA2LjFNcHBzLCBpdAo+ ID4gPiBvbmx5IGJ5cGFzcyBTTUFQIGZvciBtZXRhZGF0YSkuCj4gPiA+IAo+ID4gPiBTbyBpdCBs b29rcyBsaWtlIGZvciByZWNlbnQgbWFjaGluZSwgU01BUCBiZWNvbWVzIHBhaW4gcG9pbnQgd2hl biB0aGUgY29weQo+ID4gPiBpcyBzaG9ydCAoZS5nIDY0QikgZm9yIGhpZ2ggUFBTLgo+ID4gPiAK PiA+ID4gVGhhbmtzCj4gPiBUaGFua3MgYSBsb3QgZm9yIGxvb2tpbmcgaW50byB0aGlzIQo+ID4g Cj4gPiBTbyBmaXJzdCBvZiBhbGwgdXNlcnMgY2FuIGp1c3QgYm9vdCB3aXRoIG5vc21hcCwgcmln aHQ/Cj4gPiBXaGF0J3Mgd3Jvbmcgd2l0aCB0aGF0Pwo+IAo+IAo+IE5vdGhpbmcgd3JvbmcsIGp1 c3QgcmVhbGl6ZSB3ZSBoYWQgdGhpcyBrZXJuZWwgcGFyYW1ldGVyLgo+IAo+IAo+ID4gICBZZXMg aXQncyBub3QgZmluZS1ncmFpbmVkIGJ1dCBPVE9ICj4gPiBpdCdzIGVhc3kgdG8gdW5kZXJzdGFu ZC4KPiA+IAo+ID4gQW5kIEkgZ3Vlc3MgdGhpcyBjb25maXJtcyB0aGF0IGlmIHdlIGFyZSBnb2lu ZyB0byB3b3JyeQo+ID4gYWJvdXQgc21hcCBlbmFibGVkLCB3ZSBuZWVkIHRvIGxvb2sgaW50byBw YWNrZXQgY29waWVzCj4gPiB0b28sIG5vdCBqdXN0IG1ldGEtZGF0YS4KPiAKPiAKPiBGb3IgcGFj a2V0IGNvcGllcywgd2UgY2FuIGRvIGJhdGNoIGNvcHkgd2hpY2ggaXMgcHJldHR5IHNpbXBsZSBm b3IgdGhlIGNhc2UKPiBvZiBYRFAuIEkndmUgYWxyZWFkeSBoYWQgcGF0Y2hlcyBmb3IgdGhpcy4K PiAKPiAKPiA+IAo+ID4gVmFndWVseSBjb3VsZCBzZWUgYSBtb2R1bGUgb3B0aW9uIChvZmYgYnkg ZGVmYXVsdCkKPiA+IHdoZXJlIHZob3N0IGJhc2ljYWxseSBkb2VzIHVzZXJfYWNjZXNzX2JlZ2lu Cj4gPiB3aGVuIGl0IHN0YXJ0cyBydW5uaW5nLCB0aGVuIHVzZXMgdW5zYWZlIGFjY2Vzc2VzCj4g PiBpbiB2aG9zdCBhbmQgdHVuIGFuZCB0aGVuIHVzZXJfYWNjZXNzX2VuZC4KPiAKPiAKPiBVc2lu ZyB1c2VyX2FjY2Vzc19iZWdpbigpIGlzIG1vcmUgdHJpY2t5IHRoYW4gaW1hZ2VkLiBFLmcgaXQg cmVxdWlyZXM6Cj4gCj4gLSB1c2Vyc3BhY2UgYWRkcmVzcyB0byBiZSB2YWxpZGF0ZWQgYmVmb3Jl IHRocm91Z2ggYWNjZXNzX29rKCkgWzFdCgpUaGlzIHBhcnQgaXMgZmluZSBJIHRoaW5rIC0gYWRk cmVzc2VzIGNvbWUgZnJvbSB0aGUgbWVtb3J5Cm1hcCBhbmQgd2hlbiB1c2Vyc3BhY2Ugc3VwcGxp ZXMgdGhlIG1lbW9yeSBtYXAKd2UgdmFsaWRhdGUgZXZlcnl0aGluZyB3aXRoIGFjY2Vzc19vay4K V2VsbCBkbyB3ZSB2YWxpZGF0ZSB3aXRoIHRoZSBpb3RsYiB0b28/IERvbid0IHNlZSBpdCByaWdo dCBub3cKc28gbWF5YmUgbm90IGJ1dCBpdCdzIGVhc3kgdG8gYWRkLgoKPiAtIEl0IGRvZXNuJ3Qg c3VwcG9ydCBjYWxsaW5nIGEgZnVuY3Rpb24gdGhhdCBkb2VzIGV4cGxpY2l0IHNjaGVkdWxlIHNp bmNlCj4gU01BUC9QQU4gc3RhdGUgaXMgbm90IG1haW50YWluZWQgdGhyb3VnaCBzY2hlZHVsZSgp IFsyXQo+IAo+IFsxXSBodHRwczovL2x3bi5uZXQvQXJ0aWNsZXMvNzM2MzQ4Lwo+IAo+IFsyXSBo dHRwczovL2xrbWwub3JnL2xrbWwvMjAxOC8xMS8yMy80MzAKPiAKPiBTbyBjYWxsaW5nIHVzZXJf YWNjZXNzX2JlZ2luKCkgYWxsIHRoZSB0aW1lIHdoZW4gdmhvc3QgaXMgcnVubmluZyBzZWVtcwo+ IHByZXR0eSBkYW5nZXJvdXMuCgpZZXMgaXQgcmVxdWlyZXMgc29tZSByZXdvcmsgZS5nLiB0byB0 cnkgZ2V0dGluZyBtZW1vcnkgd2l0aApHRlBfQVRPTUlDLiBXZSBjb3VsZCB0aGVuIGRvIGEgc2xv dyBwYXRoIHdpdGggR0ZQX0tFUk5FTAppZiB0aGF0IGZhaWxzLgoKPiBGb3IgYSBiZXR0ZXIgYmF0 Y2hlZCBkYXRhY29weSwgSSB0ZW5kIHRvIGJ1aWxkIG5vdCBvbmx5IFhEUCBidXQgYWxzbyBza2Ig aW4KPiB2aG9zdCBpbiB0aGUgZnV0dXJlLgo+IAo+IFRoYW5rcwoKU3VyZSwgd2h5IG5vdC4KCj4g Cj4gPiAKPiA+IAo+ID4gPiA+ID4gPiA+ID4gPiBQYWNrZXQgb3IgQUZfWERQIGJlbmVmaXQgZnJv bQo+ID4gPiA+ID4gPiA+ID4gPiBhY2Nlc3NpbmcgbWV0YWRhdGEgZGlyZWN0bHksIHdlIHNob3Vs ZCBkbyBpdCBhcyB3ZWxsLgo+ID4gPiA+ID4gPiA+ID4gPiAKPiA+ID4gPiA+ID4gPiA+ID4gVGhh bmtzCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fClZpcnR1 YWxpemF0aW9uIG1haWxpbmcgbGlzdApWaXJ0dWFsaXphdGlvbkBsaXN0cy5saW51eC1mb3VuZGF0 aW9uLm9yZwpodHRwczovL2xpc3RzLmxpbnV4Zm91bmRhdGlvbi5vcmcvbWFpbG1hbi9saXN0aW5m by92aXJ0dWFsaXphdGlvbg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2D95C43387 for ; Sun, 30 Dec 2018 18:30:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 782FE20861 for ; Sun, 30 Dec 2018 18:30:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726663AbeL3SaR (ORCPT ); Sun, 30 Dec 2018 13:30:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42082 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726223AbeL3SaR (ORCPT ); Sun, 30 Dec 2018 13:30:17 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CD92383F3D; Sun, 30 Dec 2018 18:30:16 +0000 (UTC) Received: from redhat.com (ovpn-120-135.rdu2.redhat.com [10.10.120.135]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0DD545D9C7; Sun, 30 Dec 2018 18:30:15 +0000 (UTC) Date: Sun, 30 Dec 2018 13:30:15 -0500 From: "Michael S. Tsirkin" To: Jason Wang Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address Message-ID: <20181230132614-mutt-send-email-mst@kernel.org> References: <20181213102713-mutt-send-email-mst@kernel.org> <20181214073332-mutt-send-email-mst@kernel.org> <2ea274df-a79a-250f-648f-12927529d78a@redhat.com> <20181224125237-mutt-send-email-mst@kernel.org> <20181225071501-mutt-send-email-mst@kernel.org> <70978ed8-bf76-693a-0e11-d31b6234af5c@redhat.com> <20181226092431-mutt-send-email-mst@kernel.org> <8ef53a5c-ad4e-fadd-b460-18b3e589ead9@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8ef53a5c-ad4e-fadd-b460-18b3e589ead9@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Sun, 30 Dec 2018 18:30:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 27, 2018 at 05:39:21PM +0800, Jason Wang wrote: > > On 2018/12/26 下午11:02, Michael S. Tsirkin wrote: > > On Wed, Dec 26, 2018 at 11:57:32AM +0800, Jason Wang wrote: > > > On 2018/12/25 下午8:50, Michael S. Tsirkin wrote: > > > > On Tue, Dec 25, 2018 at 06:05:25PM +0800, Jason Wang wrote: > > > > > On 2018/12/25 上午2:10, Michael S. Tsirkin wrote: > > > > > > On Mon, Dec 24, 2018 at 03:53:16PM +0800, Jason Wang wrote: > > > > > > > On 2018/12/14 下午8:36, Michael S. Tsirkin wrote: > > > > > > > > On Fri, Dec 14, 2018 at 11:57:35AM +0800, Jason Wang wrote: > > > > > > > > > On 2018/12/13 下午11:44, Michael S. Tsirkin wrote: > > > > > > > > > > On Thu, Dec 13, 2018 at 06:10:22PM +0800, Jason Wang wrote: > > > > > > > > > > > It was noticed that the copy_user() friends that was used to access > > > > > > > > > > > virtqueue metdata tends to be very expensive for dataplane > > > > > > > > > > > implementation like vhost since it involves lots of software check, > > > > > > > > > > > speculation barrier, hardware feature toggling (e.g SMAP). The > > > > > > > > > > > extra cost will be more obvious when transferring small packets. > > > > > > > > > > > > > > > > > > > > > > This patch tries to eliminate those overhead by pin vq metadata pages > > > > > > > > > > > and access them through vmap(). During SET_VRING_ADDR, we will setup > > > > > > > > > > > those mappings and memory accessors are modified to use pointers to > > > > > > > > > > > access the metadata directly. > > > > > > > > > > > > > > > > > > > > > > Note, this was only done when device IOTLB is not enabled. We could > > > > > > > > > > > use similar method to optimize it in the future. > > > > > > > > > > > > > > > > > > > > > > Tests shows about ~24% improvement on TX PPS when using virtio-user + > > > > > > > > > > > vhost_net + xdp1 on TAP (CONFIG_HARDENED_USERCOPY is not enabled): > > > > > > > > > > > > > > > > > > > > > > Before: ~5.0Mpps > > > > > > > > > > > After: ~6.1Mpps > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Jason Wang > > > > > > > > > > > --- > > > > > > > > > > > drivers/vhost/vhost.c | 178 ++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > drivers/vhost/vhost.h | 11 +++ > > > > > > > > > > > 2 files changed, 189 insertions(+) > > > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > > > > > > > index bafe39d2e637..1bd24203afb6 100644 > > > > > > > > > > > --- a/drivers/vhost/vhost.c > > > > > > > > > > > +++ b/drivers/vhost/vhost.c > > > > > > > > > > > @@ -443,6 +443,9 @@ void vhost_dev_init(struct vhost_dev *dev, > > > > > > > > > > > vq->indirect = NULL; > > > > > > > > > > > vq->heads = NULL; > > > > > > > > > > > vq->dev = dev; > > > > > > > > > > > + memset(&vq->avail_ring, 0, sizeof(vq->avail_ring)); > > > > > > > > > > > + memset(&vq->used_ring, 0, sizeof(vq->used_ring)); > > > > > > > > > > > + memset(&vq->desc_ring, 0, sizeof(vq->desc_ring)); > > > > > > > > > > > mutex_init(&vq->mutex); > > > > > > > > > > > vhost_vq_reset(dev, vq); > > > > > > > > > > > if (vq->handle_kick) > > > > > > > > > > > @@ -614,6 +617,102 @@ static void vhost_clear_msg(struct vhost_dev *dev) > > > > > > > > > > > spin_unlock(&dev->iotlb_lock); > > > > > > > > > > > } > > > > > > > > > > > +static int vhost_init_vmap(struct vhost_vmap *map, unsigned long uaddr, > > > > > > > > > > > + size_t size, int write) > > > > > > > > > > > +{ > > > > > > > > > > > + struct page **pages; > > > > > > > > > > > + int npages = DIV_ROUND_UP(size, PAGE_SIZE); > > > > > > > > > > > + int npinned; > > > > > > > > > > > + void *vaddr; > > > > > > > > > > > + > > > > > > > > > > > + pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); > > > > > > > > > > > + if (!pages) > > > > > > > > > > > + return -ENOMEM; > > > > > > > > > > > + > > > > > > > > > > > + npinned = get_user_pages_fast(uaddr, npages, write, pages); > > > > > > > > > > > + if (npinned != npages) > > > > > > > > > > > + goto err; > > > > > > > > > > > + > > > > > > > > > > As I said I have doubts about the whole approach, but this > > > > > > > > > > implementation in particular isn't a good idea > > > > > > > > > > as it keeps the page around forever. > > > > > > > The pages wil be released during set features. > > > > > > > > > > > > > > > > > > > > > > > > So no THP, no NUMA rebalancing, > > > > > > > For THP, we will probably miss 2 or 4 pages, but does this really matter > > > > > > > consider the gain we have? > > > > > > We as in vhost? networking isn't the only thing guest does. > > > > > > We don't even know if this guest does a lot of networking. > > > > > > You don't > > > > > > know what else is in this huge page. Can be something very important > > > > > > that guest touches all the time. > > > > > Well, the probability should be very small consider we usually give several > > > > > gigabytes to guest. The rest of the pages that doesn't sit in the same > > > > > hugepage with metadata can still be merged by THP.  Anyway, I can test the > > > > > differences. > > > > Thanks! > > > > > > > > > > > For NUMA rebalancing, I'm even not quite sure if > > > > > > > it can helps for the case of IPC (vhost). It looks to me the worst case it > > > > > > > may cause page to be thrash between nodes if vhost and userspace are running > > > > > > > in two nodes. > > > > > > So again it's a gain for vhost but has a completely unpredictable effect on > > > > > > other functionality of the guest. > > > > > > > > > > > > That's what bothers me with this approach. > > > > > So: > > > > > > > > > > - The rest of the pages could still be balanced to other nodes, no? > > > > > > > > > > - try to balance metadata pages (belongs to co-operate processes) itself is > > > > > still questionable > > > > I am not sure why. It should be easy enough to force the VCPU and vhost > > > > to move (e.g. start them pinned to 1 cpu, then pin them to another one). > > > > Clearly sometimes this would be necessary for load balancing reasons. > > > > > > Yes, but it looks to me the part of motivation of auto NUMA is to avoid > > > manual pinning. > > ... of memory. Yes. > > > > > > > > With autonuma after a while (could take seconds but it will happen) the > > > > memory will migrate. > > > > > > > Yes. As you mentioned during the discuss, I wonder we could do it similarly > > > through mmu notifier like APIC access page in commit c24ae0dcd3e ("kvm: x86: > > > Unpin and remove kvm_arch->apic_access_page") > > That would be a possible approach. > > > Yes, this looks possible, and the conversion seems not hard. Let me have a > try with this. > > > [...] > > > > > > > > > > I don't see how a kthread makes any difference. We do have a validation > > > > > > > > step which makes some difference. > > > > > > > The problem is not kthread but the address of userspace address. The > > > > > > > addresses of vq metadata tends to be consistent for a while, and vhost knows > > > > > > > they will be frequently. SMAP doesn't help too much in this case. > > > > > > > > > > > > > > Thanks. > > > > > > It's true for a real life applications but a malicious one > > > > > > can call the setup ioctls any number of times. And SMAP is > > > > > > all about malcious applications. > > > > > We don't do this in the path of ioctl, there's no context switch between > > > > > userspace and kernel in the worker thread. SMAP is used to prevent kernel > > > > > from accessing userspace pages unexpectedly which is not the case for > > > > > metadata access. > > > > > > > > > > Thanks > > > > OK let's forget smap for now. > > > > > > Some numbers I measured: > > > > > > On an old Sandy bridge machine without SMAP support. Remove speculation > > > barrier boost the performance from 4.6Mpps to 5.1Mpps > > > > > > On a newer Broadwell machine with SMAP support. Remove speculation barrier > > > only gives 2%-5% improvement, disable SMAP completely through Kconfig boost > > > 57% performance from 4.8Mpps to 7.5Mpps. (Vmap gives 6Mpps - 6.1Mpps, it > > > only bypass SMAP for metadata). > > > > > > So it looks like for recent machine, SMAP becomes pain point when the copy > > > is short (e.g 64B) for high PPS. > > > > > > Thanks > > Thanks a lot for looking into this! > > > > So first of all users can just boot with nosmap, right? > > What's wrong with that? > > > Nothing wrong, just realize we had this kernel parameter. > > > > Yes it's not fine-grained but OTOH > > it's easy to understand. > > > > And I guess this confirms that if we are going to worry > > about smap enabled, we need to look into packet copies > > too, not just meta-data. > > > For packet copies, we can do batch copy which is pretty simple for the case > of XDP. I've already had patches for this. > > > > > > Vaguely could see a module option (off by default) > > where vhost basically does user_access_begin > > when it starts running, then uses unsafe accesses > > in vhost and tun and then user_access_end. > > > Using user_access_begin() is more tricky than imaged. E.g it requires: > > - userspace address to be validated before through access_ok() [1] This part is fine I think - addresses come from the memory map and when userspace supplies the memory map we validate everything with access_ok. Well do we validate with the iotlb too? Don't see it right now so maybe not but it's easy to add. > - It doesn't support calling a function that does explicit schedule since > SMAP/PAN state is not maintained through schedule() [2] > > [1] https://lwn.net/Articles/736348/ > > [2] https://lkml.org/lkml/2018/11/23/430 > > So calling user_access_begin() all the time when vhost is running seems > pretty dangerous. Yes it requires some rework e.g. to try getting memory with GFP_ATOMIC. We could then do a slow path with GFP_KERNEL if that fails. > For a better batched datacopy, I tend to build not only XDP but also skb in > vhost in the future. > > Thanks Sure, why not. > > > > > > > > > > > > > > Packet or AF_XDP benefit from > > > > > > > > > accessing metadata directly, we should do it as well. > > > > > > > > > > > > > > > > > > Thanks