From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() Date: Tue, 25 Dec 2018 07:52:42 -0500 Message-ID: <20181225075054-mutt-send-email-mst@kernel.org> References: <20181213101022.12475-1-jasowang@redhat.com> <20181213102315-mutt-send-email-mst@kernel.org> <9459e227-a943-8553-732b-d7f5225a0f22@redhat.com> <20181214072334-mutt-send-email-mst@kernel.org> <20181224131040-mutt-send-email-mst@kernel.org> <51fa034d-99ae-3820-c3a4-d9e6f2eefe34@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <51fa034d-99ae-3820-c3a4-d9e6f2eefe34@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Jason Wang Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org List-Id: virtualization@lists.linuxfoundation.org T24gVHVlLCBEZWMgMjUsIDIwMTggYXQgMDY6MDk6MTlQTSArMDgwMCwgSmFzb24gV2FuZyB3cm90 ZToKPiAKPiBPbiAyMDE4LzEyLzI1IOS4iuWNiDI6MTIsIE1pY2hhZWwgUy4gVHNpcmtpbiB3cm90 ZToKPiA+IE9uIE1vbiwgRGVjIDI0LCAyMDE4IGF0IDA0OjMyOjM5UE0gKzA4MDAsIEphc29uIFdh bmcgd3JvdGU6Cj4gPiA+IE9uIDIwMTgvMTIvMTQg5LiL5Y2IODozMywgTWljaGFlbCBTLiBUc2ly a2luIHdyb3RlOgo+ID4gPiA+IE9uIEZyaSwgRGVjIDE0LCAyMDE4IGF0IDExOjQyOjE4QU0gKzA4 MDAsIEphc29uIFdhbmcgd3JvdGU6Cj4gPiA+ID4gPiBPbiAyMDE4LzEyLzEzIOS4i+WNiDExOjI3 LCBNaWNoYWVsIFMuIFRzaXJraW4gd3JvdGU6Cj4gPiA+ID4gPiA+IE9uIFRodSwgRGVjIDEzLCAy MDE4IGF0IDA2OjEwOjE5UE0gKzA4MDAsIEphc29uIFdhbmcgd3JvdGU6Cj4gPiA+ID4gPiA+ID4g SGk6Cj4gPiA+ID4gPiA+ID4gCj4gPiA+ID4gPiA+ID4gVGhpcyBzZXJpZXMgdHJpZXMgdG8gYWNj ZXNzIHZpcnRxdWV1ZSBtZXRhZGF0YSB0aHJvdWdoIGtlcm5lbCB2aXJ0dWFsCj4gPiA+ID4gPiA+ ID4gYWRkcmVzcyBpbnN0ZWFkIG9mIGNvcHlfdXNlcigpIGZyaWVuZHMgc2luY2UgdGhleSBoYWQg dG9vIG11Y2gKPiA+ID4gPiA+ID4gPiBvdmVyaGVhZHMgbGlrZSBjaGVja3MsIHNwZWMgYmFycmll cnMgb3IgZXZlbiBoYXJkd2FyZSBmZWF0dXJlCj4gPiA+ID4gPiA+ID4gdG9nZ2xpbmcuCj4gPiA+ ID4gPiA+IFVzZXJzcGFjZSBhY2Nlc3NlcyB0aHJvdWdoIHJlbWFwcGluZyB0cmlja3MgYW5kIG5l eHQgdGltZSB0aGVyZSdzIGEgbmVlZAo+ID4gPiA+ID4gPiBmb3IgYSBuZXcgYmFycmllciB3ZSBh cmUgbGVmdCB0byBmaWd1cmUgaXQgb3V0IGJ5IG91cnNlbHZlcy4KPiA+ID4gPiA+IEkgZG9uJ3Qg Z2V0IGhlcmUsIGRvIHlvdSBtZWFuIHNwZWMgYmFycmllcnM/Cj4gPiA+ID4gSSBtZWFuIHRoZSBu ZXh0IGJhcnJpZXIgcGVvcGxlIGRlY2lkZSB0byBwdXQgaW50byB1c2Vyc3BhY2UKPiA+ID4gPiBt ZW1vcnkgYWNjZXNzZXMuCj4gPiA+ID4gCj4gPiA+ID4gPiBJdCdzIGNvbXBsZXRlbHkgdW5uZWNl c3NhcnkgZm9yCj4gPiA+ID4gPiB2aG9zdCB3aGljaCBpcyBrZXJuZWwgdGhyZWFkLgo+ID4gPiA+ IEl0J3MgZGVmZW5jZSBpbiBkZXB0aC4gVGFrZSBhIGxvb2sgYXQgdGhlIGNvbW1pdCB0aGF0IGFk ZGVkIHRoZW0uCj4gPiA+ID4gQW5kIHllcyBxdWl0ZSBwb3NzaWJseSBpbiBtb3N0IGNhc2VzIHdl IGFjdHVhbGx5IGhhdmUgYSBzcGVjCj4gPiA+ID4gYmFycmllciBpbiB0aGUgdmFsaWRhdGlvbiBw aGFzZS4gSWYgd2UgZG8gbGV0J3MgdXNlIHRoZQo+ID4gPiA+IHVuc2FmZSB2YXJpYW50cyBzbyB0 aGV5IGNhbiBiZSBmb3VuZC4KPiA+ID4gCj4gPiA+IHVuc2FmZSB2YXJpYW50cyBjYW4gb25seSB3 b3JrIGlmIHlvdSBjYW4gYmF0Y2ggdXNlcnNwYWNlIGFjY2Vzcy4gVGhpcyBpcyBub3QKPiA+ID4g bmVjZXNzYXJpbHkgdGhlIGNhc2UgZm9yIGxpZ2h0IGxvYWQuCj4gPiAKPiA+IERvIHdlIGNhcmUg YSBsb3QgYWJvdXQgdGhlIGxpZ2h0IGxvYWQ/IEhvdyB3b3VsZCB5b3UgYmVuY2htYXJrIGl0Pwo+ ID4gCj4gCj4gSWYgd2UgY2FuIHJlZHVjZSB0aGUgbGF0ZW5jeSB0aGF0J3Mgd2lsbCBiZSBtb3Jl IHRoYW4gd2hhdCB3ZSBleHBlY3QuCj4gCj4gMSBieXRlIFRDUF9SUiBzaG93cyAxLjUlLTIlIGlt cHJvdmVtZW50LgoKSXQncyBuaWNlIGJ1dCBub3QgZ3JlYXQuIEUuZy4gYWRhcHRpdmUgcG9sbGlu ZyB3b3VsZCBiZQphIGJldHRlciBhcHByb2FjaCB0byB3b3JrIG9uIGxhdGVuY3kgaW1oby4KCj4g Cj4gPiA+ID4gPiBBbmQgZXZlbiBpZiB5b3UncmUgcmlnaHQsIHZob3N0IGlzIG5vdCB0aGUKPiA+ ID4gPiA+IG9ubHkgcGxhY2UsIHRoZXJlJ3MgbG90cyBvZiB2bWFwKCkgYmFzZWQgYWNjZXNzaW5n IGluIGtlcm5lbC4KPiA+ID4gPiBGb3Igc3VyZS4gQnV0IGlmIG9uZSBjYW4gZ2V0IGJ5IHdpdGhv dXQgZ2V0IHVzZXIgcGFnZXMsIG9uZQo+ID4gPiA+IHJlYWxseSBzaG91bGQuIFdpdG5lc3MgcmVj ZW50bHkgdW5jb3ZlcmVkIG1lc3Mgd2l0aCBmaWxlCj4gPiA+ID4gYmFja2VkIHN0b3JhZ2UuCj4g PiA+IAo+ID4gPiBXZSBvbmx5IHBpbiBtZXRhZGF0YSBwYWdlcywgSSBkb24ndCBiZWxpZXZlIHRo ZXkgd2lsbCBiZSB1c2VkIGJ5IGFueSBETUEuCj4gPiBJdCBkb2Vzbid0IG1hdHRlciByZWFsbHks IGlmIHlvdSBkaXJ0eSBwYWdlcyBiZWhpbmQgdGhlIE1NIGJhY2sKPiA+IHRoZSBwcm9ibGVtIGlz IHRoZXJlLgo+IAo+IAo+IE9rLCBidXQgdGhlIHVzdWFsIGNhc2UgaXMgYW5vbnltb3VzIHBhZ2Vz LCBkbyB3ZSB1c2UgZmlsZSBiYWNrZWQgcGFnZXMgZm9yCj4gdXNlciBvZiB2aG9zdD8KClNvbWUg cGVvcGxlIHVzZSBmaWxlIGJhY2tlZCBwYWdlcyBmb3Igdm1zLgpOb3RoaW5nIHByZXZlbnRzIHRo ZW0gZnJvbSB1c2luZyB2aG9zdCBhcyB3ZWxsLgoKPiBBbmQgZXZlbiBpZiB3ZSB1c2Ugc29tZXRp bWUsIGFjY29yZGluZyB0byB0aGUgcG9pbnRlciBpdCdzCj4gbm90IHNvbWV0aGluZyB0aGF0IGNh biBmaXgsIFJGQyBoYXMgYmVlbiBwb3N0ZWQgdG8gc29sdmUgdGhpcyBpc3N1ZS4KPiAKPiBUaGFu a3MKCkV4Y2VwdCBpdCdzIG5vdCBicm9rZW4gaWYgd2UgZG9uJ3QgdG8gZ3VwICsgd3JpdGUuClNv IHllYSwgd2FpdCBmb3IgcmZjIHRvIGJlIG1lcmdlZC4KCl9fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fClZpcnR1YWxpemF0aW9uIG1haWxpbmcgbGlzdApWaXJ0 dWFsaXphdGlvbkBsaXN0cy5saW51eC1mb3VuZGF0aW9uLm9yZwpodHRwczovL2xpc3RzLmxpbnV4 Zm91bmRhdGlvbi5vcmcvbWFpbG1hbi9saXN0aW5mby92aXJ0dWFsaXphdGlvbg== From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2C46C43387 for ; Tue, 25 Dec 2018 12:52:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B2FBE2173B for ; Tue, 25 Dec 2018 12:52:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725918AbeLYMwp (ORCPT ); Tue, 25 Dec 2018 07:52:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48620 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725871AbeLYMwp (ORCPT ); Tue, 25 Dec 2018 07:52:45 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B5E6737E8E; Tue, 25 Dec 2018 12:52:44 +0000 (UTC) Received: from redhat.com (ovpn-120-80.rdu2.redhat.com [10.10.120.80]) by smtp.corp.redhat.com (Postfix) with ESMTP id 70726600C2; Tue, 25 Dec 2018 12:52:43 +0000 (UTC) Date: Tue, 25 Dec 2018 07:52:42 -0500 From: "Michael S. Tsirkin" To: Jason Wang Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap() Message-ID: <20181225075054-mutt-send-email-mst@kernel.org> References: <20181213101022.12475-1-jasowang@redhat.com> <20181213102315-mutt-send-email-mst@kernel.org> <9459e227-a943-8553-732b-d7f5225a0f22@redhat.com> <20181214072334-mutt-send-email-mst@kernel.org> <20181224131040-mutt-send-email-mst@kernel.org> <51fa034d-99ae-3820-c3a4-d9e6f2eefe34@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <51fa034d-99ae-3820-c3a4-d9e6f2eefe34@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 25 Dec 2018 12:52:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 25, 2018 at 06:09:19PM +0800, Jason Wang wrote: > > On 2018/12/25 上午2:12, Michael S. Tsirkin wrote: > > On Mon, Dec 24, 2018 at 04:32:39PM +0800, Jason Wang wrote: > > > On 2018/12/14 下午8:33, Michael S. Tsirkin wrote: > > > > On Fri, Dec 14, 2018 at 11:42:18AM +0800, Jason Wang wrote: > > > > > On 2018/12/13 下午11:27, Michael S. Tsirkin wrote: > > > > > > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: > > > > > > > Hi: > > > > > > > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > toggling. > > > > > > Userspace accesses through remapping tricks and next time there's a need > > > > > > for a new barrier we are left to figure it out by ourselves. > > > > > I don't get here, do you mean spec barriers? > > > > I mean the next barrier people decide to put into userspace > > > > memory accesses. > > > > > > > > > It's completely unnecessary for > > > > > vhost which is kernel thread. > > > > It's defence in depth. Take a look at the commit that added them. > > > > And yes quite possibly in most cases we actually have a spec > > > > barrier in the validation phase. If we do let's use the > > > > unsafe variants so they can be found. > > > > > > unsafe variants can only work if you can batch userspace access. This is not > > > necessarily the case for light load. > > > > Do we care a lot about the light load? How would you benchmark it? > > > > If we can reduce the latency that's will be more than what we expect. > > 1 byte TCP_RR shows 1.5%-2% improvement. It's nice but not great. E.g. adaptive polling would be a better approach to work on latency imho. > > > > > > And even if you're right, vhost is not the > > > > > only place, there's lots of vmap() based accessing in kernel. > > > > For sure. But if one can get by without get user pages, one > > > > really should. Witness recently uncovered mess with file > > > > backed storage. > > > > > > We only pin metadata pages, I don't believe they will be used by any DMA. > > It doesn't matter really, if you dirty pages behind the MM back > > the problem is there. > > > Ok, but the usual case is anonymous pages, do we use file backed pages for > user of vhost? Some people use file backed pages for vms. Nothing prevents them from using vhost as well. > And even if we use sometime, according to the pointer it's > not something that can fix, RFC has been posted to solve this issue. > > Thanks Except it's not broken if we don't to gup + write. So yea, wait for rfc to be merged.