From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address Date: Tue, 25 Dec 2018 07:50:43 -0500 Message-ID: <20181225071501-mutt-send-email-mst@kernel.org> References: <20181213101022.12475-1-jasowang@redhat.com> <20181213101022.12475-4-jasowang@redhat.com> <20181213102713-mutt-send-email-mst@kernel.org> <20181214073332-mutt-send-email-mst@kernel.org> <2ea274df-a79a-250f-648f-12927529d78a@redhat.com> <20181224125237-mutt-send-email-mst@kernel.org> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Jason Wang Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org List-Id: virtualization@lists.linuxfoundation.org T24gVHVlLCBEZWMgMjUsIDIwMTggYXQgMDY6MDU6MjVQTSArMDgwMCwgSmFzb24gV2FuZyB3cm90 ZToKPiAKPiBPbiAyMDE4LzEyLzI1IOS4iuWNiDI6MTAsIE1pY2hhZWwgUy4gVHNpcmtpbiB3cm90 ZToKPiA+IE9uIE1vbiwgRGVjIDI0LCAyMDE4IGF0IDAzOjUzOjE2UE0gKzA4MDAsIEphc29uIFdh bmcgd3JvdGU6Cj4gPiA+IE9uIDIwMTgvMTIvMTQg5LiL5Y2IODozNiwgTWljaGFlbCBTLiBUc2ly a2luIHdyb3RlOgo+ID4gPiA+IE9uIEZyaSwgRGVjIDE0LCAyMDE4IGF0IDExOjU3OjM1QU0gKzA4 MDAsIEphc29uIFdhbmcgd3JvdGU6Cj4gPiA+ID4gPiBPbiAyMDE4LzEyLzEzIOS4i+WNiDExOjQ0 LCBNaWNoYWVsIFMuIFRzaXJraW4gd3JvdGU6Cj4gPiA+ID4gPiA+IE9uIFRodSwgRGVjIDEzLCAy MDE4IGF0IDA2OjEwOjIyUE0gKzA4MDAsIEphc29uIFdhbmcgd3JvdGU6Cj4gPiA+ID4gPiA+ID4g SXQgd2FzIG5vdGljZWQgdGhhdCB0aGUgY29weV91c2VyKCkgZnJpZW5kcyB0aGF0IHdhcyB1c2Vk IHRvIGFjY2Vzcwo+ID4gPiA+ID4gPiA+IHZpcnRxdWV1ZSBtZXRkYXRhIHRlbmRzIHRvIGJlIHZl cnkgZXhwZW5zaXZlIGZvciBkYXRhcGxhbmUKPiA+ID4gPiA+ID4gPiBpbXBsZW1lbnRhdGlvbiBs aWtlIHZob3N0IHNpbmNlIGl0IGludm9sdmVzIGxvdHMgb2Ygc29mdHdhcmUgY2hlY2ssCj4gPiA+ ID4gPiA+ID4gc3BlY3VsYXRpb24gYmFycmllciwgaGFyZHdhcmUgZmVhdHVyZSB0b2dnbGluZyAo ZS5nIFNNQVApLiBUaGUKPiA+ID4gPiA+ID4gPiBleHRyYSBjb3N0IHdpbGwgYmUgbW9yZSBvYnZp b3VzIHdoZW4gdHJhbnNmZXJyaW5nIHNtYWxsIHBhY2tldHMuCj4gPiA+ID4gPiA+ID4gCj4gPiA+ ID4gPiA+ID4gVGhpcyBwYXRjaCB0cmllcyB0byBlbGltaW5hdGUgdGhvc2Ugb3ZlcmhlYWQgYnkg cGluIHZxIG1ldGFkYXRhIHBhZ2VzCj4gPiA+ID4gPiA+ID4gYW5kIGFjY2VzcyB0aGVtIHRocm91 Z2ggdm1hcCgpLiBEdXJpbmcgU0VUX1ZSSU5HX0FERFIsIHdlIHdpbGwgc2V0dXAKPiA+ID4gPiA+ ID4gPiB0aG9zZSBtYXBwaW5ncyBhbmQgbWVtb3J5IGFjY2Vzc29ycyBhcmUgbW9kaWZpZWQgdG8g dXNlIHBvaW50ZXJzIHRvCj4gPiA+ID4gPiA+ID4gYWNjZXNzIHRoZSBtZXRhZGF0YSBkaXJlY3Rs eS4KPiA+ID4gPiA+ID4gPiAKPiA+ID4gPiA+ID4gPiBOb3RlLCB0aGlzIHdhcyBvbmx5IGRvbmUg d2hlbiBkZXZpY2UgSU9UTEIgaXMgbm90IGVuYWJsZWQuIFdlIGNvdWxkCj4gPiA+ID4gPiA+ID4g dXNlIHNpbWlsYXIgbWV0aG9kIHRvIG9wdGltaXplIGl0IGluIHRoZSBmdXR1cmUuCj4gPiA+ID4g PiA+ID4gCj4gPiA+ID4gPiA+ID4gVGVzdHMgc2hvd3MgYWJvdXQgfjI0JSBpbXByb3ZlbWVudCBv biBUWCBQUFMgd2hlbiB1c2luZyB2aXJ0aW8tdXNlciArCj4gPiA+ID4gPiA+ID4gdmhvc3RfbmV0 ICsgeGRwMSBvbiBUQVAgKENPTkZJR19IQVJERU5FRF9VU0VSQ09QWSBpcyBub3QgZW5hYmxlZCk6 Cj4gPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gQmVmb3JlOiB+NS4wTXBwcwo+ID4gPiA+ID4g PiA+IEFmdGVyOiAgfjYuMU1wcHMKPiA+ID4gPiA+ID4gPiAKPiA+ID4gPiA+ID4gPiBTaWduZWQt b2ZmLWJ5OiBKYXNvbiBXYW5nPGphc293YW5nQHJlZGhhdC5jb20+Cj4gPiA+ID4gPiA+ID4gLS0t Cj4gPiA+ID4gPiA+ID4gICAgIGRyaXZlcnMvdmhvc3Qvdmhvc3QuYyB8IDE3OCArKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysKPiA+ID4gPiA+ID4gPiAgICAgZHJpdmVy cy92aG9zdC92aG9zdC5oIHwgIDExICsrKwo+ID4gPiA+ID4gPiA+ICAgICAyIGZpbGVzIGNoYW5n ZWQsIDE4OSBpbnNlcnRpb25zKCspCj4gPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gZGlmZiAt LWdpdCBhL2RyaXZlcnMvdmhvc3Qvdmhvc3QuYyBiL2RyaXZlcnMvdmhvc3Qvdmhvc3QuYwo+ID4g PiA+ID4gPiA+IGluZGV4IGJhZmUzOWQyZTYzNy4uMWJkMjQyMDNhZmI2IDEwMDY0NAo+ID4gPiA+ ID4gPiA+IC0tLSBhL2RyaXZlcnMvdmhvc3Qvdmhvc3QuYwo+ID4gPiA+ID4gPiA+ICsrKyBiL2Ry aXZlcnMvdmhvc3Qvdmhvc3QuYwo+ID4gPiA+ID4gPiA+IEBAIC00NDMsNiArNDQzLDkgQEAgdm9p ZCB2aG9zdF9kZXZfaW5pdChzdHJ1Y3Qgdmhvc3RfZGV2ICpkZXYsCj4gPiA+ID4gPiA+ID4gICAg IAkJdnEtPmluZGlyZWN0ID0gTlVMTDsKPiA+ID4gPiA+ID4gPiAgICAgCQl2cS0+aGVhZHMgPSBO VUxMOwo+ID4gPiA+ID4gPiA+ICAgICAJCXZxLT5kZXYgPSBkZXY7Cj4gPiA+ID4gPiA+ID4gKwkJ bWVtc2V0KCZ2cS0+YXZhaWxfcmluZywgMCwgc2l6ZW9mKHZxLT5hdmFpbF9yaW5nKSk7Cj4gPiA+ ID4gPiA+ID4gKwkJbWVtc2V0KCZ2cS0+dXNlZF9yaW5nLCAwLCBzaXplb2YodnEtPnVzZWRfcmlu ZykpOwo+ID4gPiA+ID4gPiA+ICsJCW1lbXNldCgmdnEtPmRlc2NfcmluZywgMCwgc2l6ZW9mKHZx LT5kZXNjX3JpbmcpKTsKPiA+ID4gPiA+ID4gPiAgICAgCQltdXRleF9pbml0KCZ2cS0+bXV0ZXgp Owo+ID4gPiA+ID4gPiA+ICAgICAJCXZob3N0X3ZxX3Jlc2V0KGRldiwgdnEpOwo+ID4gPiA+ID4g PiA+ICAgICAJCWlmICh2cS0+aGFuZGxlX2tpY2spCj4gPiA+ID4gPiA+ID4gQEAgLTYxNCw2ICs2 MTcsMTAyIEBAIHN0YXRpYyB2b2lkIHZob3N0X2NsZWFyX21zZyhzdHJ1Y3Qgdmhvc3RfZGV2ICpk ZXYpCj4gPiA+ID4gPiA+ID4gICAgIAlzcGluX3VubG9jaygmZGV2LT5pb3RsYl9sb2NrKTsKPiA+ ID4gPiA+ID4gPiAgICAgfQo+ID4gPiA+ID4gPiA+ICtzdGF0aWMgaW50IHZob3N0X2luaXRfdm1h cChzdHJ1Y3Qgdmhvc3Rfdm1hcCAqbWFwLCB1bnNpZ25lZCBsb25nIHVhZGRyLAo+ID4gPiA+ID4g PiA+ICsJCQkgICBzaXplX3Qgc2l6ZSwgaW50IHdyaXRlKQo+ID4gPiA+ID4gPiA+ICt7Cj4gPiA+ ID4gPiA+ID4gKwlzdHJ1Y3QgcGFnZSAqKnBhZ2VzOwo+ID4gPiA+ID4gPiA+ICsJaW50IG5wYWdl cyA9IERJVl9ST1VORF9VUChzaXplLCBQQUdFX1NJWkUpOwo+ID4gPiA+ID4gPiA+ICsJaW50IG5w aW5uZWQ7Cj4gPiA+ID4gPiA+ID4gKwl2b2lkICp2YWRkcjsKPiA+ID4gPiA+ID4gPiArCj4gPiA+ ID4gPiA+ID4gKwlwYWdlcyA9IGttYWxsb2NfYXJyYXkobnBhZ2VzLCBzaXplb2Yoc3RydWN0IHBh Z2UgKiksIEdGUF9LRVJORUwpOwo+ID4gPiA+ID4gPiA+ICsJaWYgKCFwYWdlcykKPiA+ID4gPiA+ ID4gPiArCQlyZXR1cm4gLUVOT01FTTsKPiA+ID4gPiA+ID4gPiArCj4gPiA+ID4gPiA+ID4gKwlu cGlubmVkID0gZ2V0X3VzZXJfcGFnZXNfZmFzdCh1YWRkciwgbnBhZ2VzLCB3cml0ZSwgcGFnZXMp Owo+ID4gPiA+ID4gPiA+ICsJaWYgKG5waW5uZWQgIT0gbnBhZ2VzKQo+ID4gPiA+ID4gPiA+ICsJ CWdvdG8gZXJyOwo+ID4gPiA+ID4gPiA+ICsKPiA+ID4gPiA+ID4gQXMgSSBzYWlkIEkgaGF2ZSBk b3VidHMgYWJvdXQgdGhlIHdob2xlIGFwcHJvYWNoLCBidXQgdGhpcwo+ID4gPiA+ID4gPiBpbXBs ZW1lbnRhdGlvbiBpbiBwYXJ0aWN1bGFyIGlzbid0IGEgZ29vZCBpZGVhCj4gPiA+ID4gPiA+IGFz IGl0IGtlZXBzIHRoZSBwYWdlIGFyb3VuZCBmb3JldmVyLgo+ID4gPiAKPiA+ID4gVGhlIHBhZ2Vz IHdpbCBiZSByZWxlYXNlZCBkdXJpbmcgc2V0IGZlYXR1cmVzLgo+ID4gPiAKPiA+ID4gCj4gPiA+ ID4gPiA+IFNvIG5vIFRIUCwgbm8gTlVNQSByZWJhbGFuY2luZywKPiA+ID4gCj4gPiA+IEZvciBU SFAsIHdlIHdpbGwgcHJvYmFibHkgbWlzcyAyIG9yIDQgcGFnZXMsIGJ1dCBkb2VzIHRoaXMgcmVh bGx5IG1hdHRlcgo+ID4gPiBjb25zaWRlciB0aGUgZ2FpbiB3ZSBoYXZlPwo+ID4gV2UgYXMgaW4g dmhvc3Q/IG5ldHdvcmtpbmcgaXNuJ3QgdGhlIG9ubHkgdGhpbmcgZ3Vlc3QgZG9lcy4KPiA+IFdl IGRvbid0IGV2ZW4ga25vdyBpZiB0aGlzIGd1ZXN0IGRvZXMgYSBsb3Qgb2YgbmV0d29ya2luZy4K PiA+IFlvdSBkb24ndAo+ID4ga25vdyB3aGF0IGVsc2UgaXMgaW4gdGhpcyBodWdlIHBhZ2UuIENh biBiZSBzb21ldGhpbmcgdmVyeSBpbXBvcnRhbnQKPiA+IHRoYXQgZ3Vlc3QgdG91Y2hlcyBhbGwg dGhlIHRpbWUuCj4gCj4gCj4gV2VsbCwgdGhlIHByb2JhYmlsaXR5IHNob3VsZCBiZSB2ZXJ5IHNt YWxsIGNvbnNpZGVyIHdlIHVzdWFsbHkgZ2l2ZSBzZXZlcmFsCj4gZ2lnYWJ5dGVzIHRvIGd1ZXN0 LiBUaGUgcmVzdCBvZiB0aGUgcGFnZXMgdGhhdCBkb2Vzbid0IHNpdCBpbiB0aGUgc2FtZQo+IGh1 Z2VwYWdlIHdpdGggbWV0YWRhdGEgY2FuIHN0aWxsIGJlIG1lcmdlZCBieSBUSFAuwqAgQW55d2F5 LCBJIGNhbiB0ZXN0IHRoZQo+IGRpZmZlcmVuY2VzLgoKVGhhbmtzIQoKPiAKPiA+IAo+ID4gPiBG b3IgTlVNQSByZWJhbGFuY2luZywgSSdtIGV2ZW4gbm90IHF1aXRlIHN1cmUgaWYKPiA+ID4gaXQg Y2FuIGhlbHBzIGZvciB0aGUgY2FzZSBvZiBJUEMgKHZob3N0KS4gSXQgbG9va3MgdG8gbWUgdGhl IHdvcnN0IGNhc2UgaXQKPiA+ID4gbWF5IGNhdXNlIHBhZ2UgdG8gYmUgdGhyYXNoIGJldHdlZW4g bm9kZXMgaWYgdmhvc3QgYW5kIHVzZXJzcGFjZSBhcmUgcnVubmluZwo+ID4gPiBpbiB0d28gbm9k ZXMuCj4gPiAKPiA+IFNvIGFnYWluIGl0J3MgYSBnYWluIGZvciB2aG9zdCBidXQgaGFzIGEgY29t cGxldGVseSB1bnByZWRpY3RhYmxlIGVmZmVjdCBvbgo+ID4gb3RoZXIgZnVuY3Rpb25hbGl0eSBv ZiB0aGUgZ3Vlc3QuCj4gPiAKPiA+IFRoYXQncyB3aGF0IGJvdGhlcnMgbWUgd2l0aCB0aGlzIGFw cHJvYWNoLgo+IAo+IAo+IFNvOgo+IAo+IC0gVGhlIHJlc3Qgb2YgdGhlIHBhZ2VzIGNvdWxkIHN0 aWxsIGJlIGJhbGFuY2VkIHRvIG90aGVyIG5vZGVzLCBubz8KPiAKPiAtIHRyeSB0byBiYWxhbmNl IG1ldGFkYXRhIHBhZ2VzIChiZWxvbmdzIHRvIGNvLW9wZXJhdGUgcHJvY2Vzc2VzKSBpdHNlbGYg aXMKPiBzdGlsbCBxdWVzdGlvbmFibGUKCkkgYW0gbm90IHN1cmUgd2h5LiBJdCBzaG91bGQgYmUg ZWFzeSBlbm91Z2ggdG8gZm9yY2UgdGhlIFZDUFUgYW5kIHZob3N0CnRvIG1vdmUgKGUuZy4gc3Rh cnQgdGhlbSBwaW5uZWQgdG8gMSBjcHUsIHRoZW4gcGluIHRoZW0gdG8gYW5vdGhlciBvbmUpLgpD bGVhcmx5IHNvbWV0aW1lcyB0aGlzIHdvdWxkIGJlIG5lY2Vzc2FyeSBmb3IgbG9hZCBiYWxhbmNp bmcgcmVhc29ucy4KV2l0aCBhdXRvbnVtYSBhZnRlciBhIHdoaWxlIChjb3VsZCB0YWtlIHNlY29u ZHMgYnV0IGl0IHdpbGwgaGFwcGVuKSB0aGUKbWVtb3J5IHdpbGwgbWlncmF0ZS4KCgoKCj4gCj4g PiAKPiA+IAo+ID4gCj4gPiAKPiA+ID4gPiA+IFRoaXMgaXMgdGhlIHByaWNlIG9mIGFsbCBHVVAg dXNlcnMgbm90IG9ubHkgdmhvc3QgaXRzZWxmLgo+ID4gPiA+IFllcy4gR1VQIGlzIGp1c3Qgbm90 IGEgZ3JlYXQgaW50ZXJmYWNlIGZvciB2aG9zdCB0byB1c2UuCj4gPiA+IAo+ID4gPiBaZXJvY29w eSBjb2RlcyAoZW5hYmxlZCBieSBkZWZ1YWx0KSB1c2UgdGhlbSBmb3IgeWVhcnMuCj4gPiBCdXQg b25seSBmb3IgVFggYW5kIHRlbXBvcmFyaWx5LiBXZSBwaW4sIHJlYWQsIHVucGluLgo+IAo+IAo+ IFByb2JhYmx5IG5vdC4gRm9yIHNldmVyYWwgcmVhc29ucyB0aGF0IHRoZSBwYWdlIHdpbGwgYmUg bm90IGJlIHJlbGVhc2VkIHNvb24KPiBvciBoZWxkIGZvciBhIHZlcnkgbG9uZyBwZXJpb2Qgb2Yg dGltZSBvciBldmVuIGZvcmV2ZXIuCgoKV2l0aCB6ZXJvIGNvcHk/IFdlbGwgaXQncyBwaW5uZWQg dW50aWwgdHJhbnNtaXQuIFRha2VzIGEgd2hpbGUKYnV0IGNvdWxkIGJlIGVub3VnaCBmb3IgYXV0 b2NvcHkgdG8gd29yayBlc3Agc2luY2UKaXRzIHRoZSBwYWNrZXQgbWVtb3J5IHNvIG5vdCByZXVz ZWQgaW1tZWRpYXRlbHkuCgo+IAo+ID4gCj4gPiBZb3VyIHBhdGNoIGlzIGRpZmZlcmVudAo+ID4g Cj4gPiAtIGl0IHdyaXRlcyBpbnRvIG1lbW9yeSBhbmQgR1VQIGhhcyBrbm93biBpc3N1ZXMgd2l0 aCBmaWxlCj4gPiAgICBiYWNrZWQgbWVtb3J5Cj4gCj4gCj4gVGhlIG9yZGluYXJ5IHVzZXIgZm9y IHZob3N0IGlzIGFub255bW91cyBwYWdlcyBJIHRoaW5rPwoKCkl0J3Mgbm90IHRoZSBtb3N0IGNv bW1vbiBzY2VuYXJpbyBhbmQgbm90IHRoZSBmYXN0ZXN0IG9uZQooZS5nLiBUSFAgZG9lcyBub3Qg d29yaykgYnV0IGZpbGUgYmFja2VkIGlzIHVzZWZ1bCBzb21ldGltZXMuCkl0IHdvdWxkIG5vdCBi ZSBuaWNlIGF0IGFsbCB0byBjb3JydXB0IGd1ZXN0IG1lbW9yeSBpbiB0aGF0IGNhc2UuCgoKPiAK PiA+IC0gaXQga2VlcHMgcGFnZXMgcGlubmVkIGZvcmV2ZXIKPiA+IAo+ID4gCj4gPiAKPiA+ID4g PiA+IFdoYXQncyBtb3JlCj4gPiA+ID4gPiBpbXBvcnRhbnQsIHRoZSBnb2FsIGlzIG5vdCB0byBi ZSBsZWZ0IHRvbyBtdWNoIGJlaGluZCBmb3Igb3RoZXIgYmFja2VuZHMKPiA+ID4gPiA+IGxpa2Ug RFBESyBvciBBRl9YRFAgKGFsbCBvZiB3aGljaCBhcmUgdXNpbmcgR1VQKS4KPiA+ID4gPiBTbyB0 aGVzZSBndXlzIGFzc3VtZSB1c2Vyc3BhY2Uga25vd3Mgd2hhdCBpdCdzIGRvaW5nLgo+ID4gPiA+ IFdlIGNhbid0IGFzc3VtZSB0aGF0Lgo+ID4gPiAKPiA+ID4gV2hhdCBraW5kIG9mIGFzc3VtcHRp b24gZG8geW91IHRoZXkgaGF2ZT8KPiA+ID4gCj4gPiA+IAo+ID4gPiA+ID4gPiB1c2Vyc3BhY2Ut Y29udHJvbGxlZAo+ID4gPiA+ID4gPiBhbW91bnQgb2YgbWVtb3J5IGxvY2tlZCB1cCBhbmQgbm90 IGFjY291bnRlZCBmb3IuCj4gPiA+ID4gPiBJdCdzIHByZXR0eSBlYXN5IHRvIGFkZCB0aGlzIHNp bmNlIHRoZSBzbG93IHBhdGggd2FzIHN0aWxsIGtlcHQuIElmIHdlCj4gPiA+ID4gPiBleGNlZWRz IHRoZSBsaW1pdGF0aW9uLCB3ZSBjYW4gc3dpdGNoIGJhY2sgdG8gc2xvdyBwYXRoLgo+ID4gPiA+ ID4gCj4gPiA+ID4gPiA+IERvbid0IGdldCBtZSB3cm9uZyBpdCdzIGEgZ3JlYXQgcGF0Y2ggaW4g YW4gaWRlYWwgd29ybGQuCj4gPiA+ID4gPiA+IEJ1dCB0aGVuIGluIGFuIGlkZWFsIHdvcmxkIG5v IGJhcnJpZXJzIHNtYXAgZXRjIGFyZSBuZWNlc3NhcnkgYXQgYWxsLgo+ID4gPiA+ID4gQWdhaW4s IHRoaXMgaXMgb25seSBmb3IgbWV0YWRhdGEgYWNjZXNzaW5nIG5vdCB0aGUgZGF0YSB3aGljaCBo YXMgYmVlbiB1c2VkCj4gPiA+ID4gPiBmb3IgeWVhcnMgZm9yIHJlYWwgdXNlIGNhc2VzLgo+ID4g PiA+ID4gCj4gPiA+ID4gPiBGb3IgU01BUCwgaXQgbWFrZXMgc2Vuc2VzIGZvciB0aGUgYWRkcmVz cyB0aGF0IGtlcm5lbCBjYW4gbm90IGZvcmNhc3QuIEJ1dAo+ID4gPiA+ID4gaXQncyBub3QgdGhl IGNhc2UgZm9yIHRoZSB2aG9zdCBtZXRhZGF0YSBzaW5jZSB3ZSBrbm93IHRoZSBhZGRyZXNzIHdp bGwgYmUKPiA+ID4gPiA+IGFjY2Vzc2VkIHZlcnkgZnJlcXVlbnRseS4gRm9yIHNwZWN1bGF0aW9u IGJhcnJpZXIsIGl0IGhlbHBzIG5vdGhpbmcgZm9yIHRoZQo+ID4gPiA+ID4gZGF0YSBwYXRoIG9m IHZob3N0IHdoaWNoIGlzIGEga3RocmVhZC4KPiA+ID4gPiBJIGRvbid0IHNlZSBob3cgYSBrdGhy ZWFkIG1ha2VzIGFueSBkaWZmZXJlbmNlLiBXZSBkbyBoYXZlIGEgdmFsaWRhdGlvbgo+ID4gPiA+ IHN0ZXAgd2hpY2ggbWFrZXMgc29tZSBkaWZmZXJlbmNlLgo+ID4gPiAKPiA+ID4gVGhlIHByb2Js ZW0gaXMgbm90IGt0aHJlYWQgYnV0IHRoZSBhZGRyZXNzIG9mIHVzZXJzcGFjZSBhZGRyZXNzLiBU aGUKPiA+ID4gYWRkcmVzc2VzIG9mIHZxIG1ldGFkYXRhIHRlbmRzIHRvIGJlIGNvbnNpc3RlbnQg Zm9yIGEgd2hpbGUsIGFuZCB2aG9zdCBrbm93cwo+ID4gPiB0aGV5IHdpbGwgYmUgZnJlcXVlbnRs eS4gU01BUCBkb2Vzbid0IGhlbHAgdG9vIG11Y2ggaW4gdGhpcyBjYXNlLgo+ID4gPiAKPiA+ID4g VGhhbmtzLgo+ID4gSXQncyB0cnVlIGZvciBhIHJlYWwgbGlmZSBhcHBsaWNhdGlvbnMgYnV0IGEg bWFsaWNpb3VzIG9uZQo+ID4gY2FuIGNhbGwgdGhlIHNldHVwIGlvY3RscyBhbnkgbnVtYmVyIG9m IHRpbWVzLiBBbmQgU01BUCBpcwo+ID4gYWxsIGFib3V0IG1hbGNpb3VzIGFwcGxpY2F0aW9ucy4K PiAKPiAKPiBXZSBkb24ndCBkbyB0aGlzIGluIHRoZSBwYXRoIG9mIGlvY3RsLCB0aGVyZSdzIG5v IGNvbnRleHQgc3dpdGNoIGJldHdlZW4KPiB1c2Vyc3BhY2UgYW5kIGtlcm5lbCBpbiB0aGUgd29y a2VyIHRocmVhZC4gU01BUCBpcyB1c2VkIHRvIHByZXZlbnQga2VybmVsCj4gZnJvbSBhY2Nlc3Np bmcgdXNlcnNwYWNlIHBhZ2VzIHVuZXhwZWN0ZWRseSB3aGljaCBpcyBub3QgdGhlIGNhc2UgZm9y Cj4gbWV0YWRhdGEgYWNjZXNzLgo+IAo+IFRoYW5rcwoKT0sgbGV0J3MgZm9yZ2V0IHNtYXAgZm9y IG5vdy4KCj4gCj4gPiAKPiA+ID4gPiA+IFBhY2tldCBvciBBRl9YRFAgYmVuZWZpdCBmcm9tCj4g PiA+ID4gPiBhY2Nlc3NpbmcgbWV0YWRhdGEgZGlyZWN0bHksIHdlIHNob3VsZCBkbyBpdCBhcyB3 ZWxsLgo+ID4gPiA+ID4gCj4gPiA+ID4gPiBUaGFua3MKX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX18KVmlydHVhbGl6YXRpb24gbWFpbGluZyBsaXN0ClZpcnR1 YWxpemF0aW9uQGxpc3RzLmxpbnV4LWZvdW5kYXRpb24ub3JnCmh0dHBzOi8vbGlzdHMubGludXhm b3VuZGF0aW9uLm9yZy9tYWlsbWFuL2xpc3RpbmZvL3ZpcnR1YWxpemF0aW9u From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16DFCC43387 for ; Tue, 25 Dec 2018 12:50:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CE0D52177E for ; Tue, 25 Dec 2018 12:50:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725898AbeLYMuq (ORCPT ); Tue, 25 Dec 2018 07:50:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:57976 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725372AbeLYMuq (ORCPT ); Tue, 25 Dec 2018 07:50:46 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3D29D394D43; Tue, 25 Dec 2018 12:50:45 +0000 (UTC) Received: from redhat.com (ovpn-120-80.rdu2.redhat.com [10.10.120.80]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6E9E160C54; Tue, 25 Dec 2018 12:50:44 +0000 (UTC) Date: Tue, 25 Dec 2018 07:50:43 -0500 From: "Michael S. Tsirkin" To: Jason Wang Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address Message-ID: <20181225071501-mutt-send-email-mst@kernel.org> References: <20181213101022.12475-1-jasowang@redhat.com> <20181213101022.12475-4-jasowang@redhat.com> <20181213102713-mutt-send-email-mst@kernel.org> <20181214073332-mutt-send-email-mst@kernel.org> <2ea274df-a79a-250f-648f-12927529d78a@redhat.com> <20181224125237-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 25 Dec 2018 12:50:45 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 25, 2018 at 06:05:25PM +0800, Jason Wang wrote: > > On 2018/12/25 上午2:10, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 03:53:16PM +0800, Jason Wang wrote: > > > On 2018/12/14 下午8:36, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 11:57:35AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 下午11:44, Michael S. Tsirkin wrote: > > > > > > On Thu, Dec 13, 2018 at 06:10:22PM +0800, Jason Wang wrote: > > > > > > > It was noticed that the copy_user() friends that was used to access > > > > > > > virtqueue metdata tends to be very expensive for dataplane > > > > > > > implementation like vhost since it involves lots of software check, > > > > > > > speculation barrier, hardware feature toggling (e.g SMAP). The > > > > > > > extra cost will be more obvious when transferring small packets. > > > > > > > > > > > > > > This patch tries to eliminate those overhead by pin vq metadata pages > > > > > > > and access them through vmap(). During SET_VRING_ADDR, we will setup > > > > > > > those mappings and memory accessors are modified to use pointers to > > > > > > > access the metadata directly. > > > > > > > > > > > > > > Note, this was only done when device IOTLB is not enabled. We could > > > > > > > use similar method to optimize it in the future. > > > > > > > > > > > > > > Tests shows about ~24% improvement on TX PPS when using virtio-user + > > > > > > > vhost_net + xdp1 on TAP (CONFIG_HARDENED_USERCOPY is not enabled): > > > > > > > > > > > > > > Before: ~5.0Mpps > > > > > > > After: ~6.1Mpps > > > > > > > > > > > > > > Signed-off-by: Jason Wang > > > > > > > --- > > > > > > > drivers/vhost/vhost.c | 178 ++++++++++++++++++++++++++++++++++++++++++ > > > > > > > drivers/vhost/vhost.h | 11 +++ > > > > > > > 2 files changed, 189 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > > > > index bafe39d2e637..1bd24203afb6 100644 > > > > > > > --- a/drivers/vhost/vhost.c > > > > > > > +++ b/drivers/vhost/vhost.c > > > > > > > @@ -443,6 +443,9 @@ void vhost_dev_init(struct vhost_dev *dev, > > > > > > > vq->indirect = NULL; > > > > > > > vq->heads = NULL; > > > > > > > vq->dev = dev; > > > > > > > + memset(&vq->avail_ring, 0, sizeof(vq->avail_ring)); > > > > > > > + memset(&vq->used_ring, 0, sizeof(vq->used_ring)); > > > > > > > + memset(&vq->desc_ring, 0, sizeof(vq->desc_ring)); > > > > > > > mutex_init(&vq->mutex); > > > > > > > vhost_vq_reset(dev, vq); > > > > > > > if (vq->handle_kick) > > > > > > > @@ -614,6 +617,102 @@ static void vhost_clear_msg(struct vhost_dev *dev) > > > > > > > spin_unlock(&dev->iotlb_lock); > > > > > > > } > > > > > > > +static int vhost_init_vmap(struct vhost_vmap *map, unsigned long uaddr, > > > > > > > + size_t size, int write) > > > > > > > +{ > > > > > > > + struct page **pages; > > > > > > > + int npages = DIV_ROUND_UP(size, PAGE_SIZE); > > > > > > > + int npinned; > > > > > > > + void *vaddr; > > > > > > > + > > > > > > > + pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); > > > > > > > + if (!pages) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > + npinned = get_user_pages_fast(uaddr, npages, write, pages); > > > > > > > + if (npinned != npages) > > > > > > > + goto err; > > > > > > > + > > > > > > As I said I have doubts about the whole approach, but this > > > > > > implementation in particular isn't a good idea > > > > > > as it keeps the page around forever. > > > > > > The pages wil be released during set features. > > > > > > > > > > > > So no THP, no NUMA rebalancing, > > > > > > For THP, we will probably miss 2 or 4 pages, but does this really matter > > > consider the gain we have? > > We as in vhost? networking isn't the only thing guest does. > > We don't even know if this guest does a lot of networking. > > You don't > > know what else is in this huge page. Can be something very important > > that guest touches all the time. > > > Well, the probability should be very small consider we usually give several > gigabytes to guest. The rest of the pages that doesn't sit in the same > hugepage with metadata can still be merged by THP.  Anyway, I can test the > differences. Thanks! > > > > > > For NUMA rebalancing, I'm even not quite sure if > > > it can helps for the case of IPC (vhost). It looks to me the worst case it > > > may cause page to be thrash between nodes if vhost and userspace are running > > > in two nodes. > > > > So again it's a gain for vhost but has a completely unpredictable effect on > > other functionality of the guest. > > > > That's what bothers me with this approach. > > > So: > > - The rest of the pages could still be balanced to other nodes, no? > > - try to balance metadata pages (belongs to co-operate processes) itself is > still questionable I am not sure why. It should be easy enough to force the VCPU and vhost to move (e.g. start them pinned to 1 cpu, then pin them to another one). Clearly sometimes this would be necessary for load balancing reasons. With autonuma after a while (could take seconds but it will happen) the memory will migrate. > > > > > > > > > > > > > > This is the price of all GUP users not only vhost itself. > > > > Yes. GUP is just not a great interface for vhost to use. > > > > > > Zerocopy codes (enabled by defualt) use them for years. > > But only for TX and temporarily. We pin, read, unpin. > > > Probably not. For several reasons that the page will be not be released soon > or held for a very long period of time or even forever. With zero copy? Well it's pinned until transmit. Takes a while but could be enough for autocopy to work esp since its the packet memory so not reused immediately. > > > > > Your patch is different > > > > - it writes into memory and GUP has known issues with file > > backed memory > > > The ordinary user for vhost is anonymous pages I think? It's not the most common scenario and not the fastest one (e.g. THP does not work) but file backed is useful sometimes. It would not be nice at all to corrupt guest memory in that case. > > > - it keeps pages pinned forever > > > > > > > > > > > What's more > > > > > important, the goal is not to be left too much behind for other backends > > > > > like DPDK or AF_XDP (all of which are using GUP). > > > > So these guys assume userspace knows what it's doing. > > > > We can't assume that. > > > > > > What kind of assumption do you they have? > > > > > > > > > > > > userspace-controlled > > > > > > amount of memory locked up and not accounted for. > > > > > It's pretty easy to add this since the slow path was still kept. If we > > > > > exceeds the limitation, we can switch back to slow path. > > > > > > > > > > > Don't get me wrong it's a great patch in an ideal world. > > > > > > But then in an ideal world no barriers smap etc are necessary at all. > > > > > Again, this is only for metadata accessing not the data which has been used > > > > > for years for real use cases. > > > > > > > > > > For SMAP, it makes senses for the address that kernel can not forcast. But > > > > > it's not the case for the vhost metadata since we know the address will be > > > > > accessed very frequently. For speculation barrier, it helps nothing for the > > > > > data path of vhost which is a kthread. > > > > I don't see how a kthread makes any difference. We do have a validation > > > > step which makes some difference. > > > > > > The problem is not kthread but the address of userspace address. The > > > addresses of vq metadata tends to be consistent for a while, and vhost knows > > > they will be frequently. SMAP doesn't help too much in this case. > > > > > > Thanks. > > It's true for a real life applications but a malicious one > > can call the setup ioctls any number of times. And SMAP is > > all about malcious applications. > > > We don't do this in the path of ioctl, there's no context switch between > userspace and kernel in the worker thread. SMAP is used to prevent kernel > from accessing userspace pages unexpectedly which is not the case for > metadata access. > > Thanks OK let's forget smap for now. > > > > > > > > Packet or AF_XDP benefit from > > > > > accessing metadata directly, we should do it as well. > > > > > > > > > > Thanks