From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43168) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a99qy-0000rp-1Y for qemu-devel@nongnu.org; Wed, 16 Dec 2015 06:02:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a99qu-0004DT-R4 for qemu-devel@nongnu.org; Wed, 16 Dec 2015 06:02:39 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60057) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a99qu-0004DE-JR for qemu-devel@nongnu.org; Wed, 16 Dec 2015 06:02:36 -0500 Date: Wed, 16 Dec 2015 13:02:31 +0200 From: "Michael S. Tsirkin" Message-ID: <20151216124045-mutt-send-email-mst@redhat.com> References: <5671230C.70102@redhat.com> <5671300E.5060109@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v2 0/3] virtio: proposal to optimize accesses to VQs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vincenzo Maffione Cc: Jason Wang , Markus Armbruster , qemu-devel , Paolo Bonzini , Giuseppe Lettieri , Luigi Rizzo On Wed, Dec 16, 2015 at 11:39:46AM +0100, Vincenzo Maffione wrote: > 2015-12-16 10:34 GMT+01:00 Paolo Bonzini : > > > > > > On 16/12/2015 10:28, Vincenzo Maffione wrote: > >> Assuming my TX experiments with disconnected backend (and I disable > >> CPU dynamic scaling of performance, etc.): > >> 1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 Mpps. > >> 2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps. > >> > >> So I see an improvement for patch 3, and I guess it's because we avoid > >> an additional memory translation and related overhead. I believe that > >> avoiding the memory translation is more beneficial than avoiding the > >> variable-sized memcpy. > >> I'm not surprised of that, because taking a brief look at what happens > >> under the hood when you call an access_memory() function - it looks > >> like a lot of operations. > > > > Great, thanks for confirming! > > > > Paolo > > No problems. > > I have some additional (orthogonal) curiosities: > > 1) Assuming "hw/virtio/dataplane/vring.c" is what I think it is (VQ > data structures directly accessible in the host virtual memory, with > guest-phyisical-to-host-virtual mapping done statically at setup time) > why isn't QEMU using this approach also for virtio-net? I see it is > used by virtio-blk only. Because on Linux, nothing would be gained as compared to using vhost-net in kernel or vhost-user with dpdk. virtio-net is there for non-Linux hosts, keeping it simple is important to avoid e.g. security problems. Same as serial, etc. > 2) In any case (vring or not) QEMU dynamically maps data buffers > from guest physical memory, for each descriptor to be processed: e1000 > uses pci_dma_read/pci_dma_write, virtio uses > cpu_physical_memory_map()/cpu_physical_memory_unmap(), vring uses the > more specialied vring_map()/vring_unmap(). All of these go through > expensive lookups and related operations to do the address > translation. > Have you considered the possibility to cache the translation result to > remove this bottleneck (maybe just for virtio devices)? Or is any > consistency or migration-related problem that would create issues? > Just to give an example of what I'm talking about: > https://github.com/vmaffione/qemu/blob/master/hw/net/e1000.c#L349-L423. > > At very high packet rates, once notifications (kicks and interrupts) > have been amortized in some way, memory translation becomes the major > bottleneck. And this (1 and 2) is why QEMU virtio implementation > cannot achieve the same throughput as bhyve does (5-6 Mpps or more > IIRC). > > Cheers, > Vincenzo > > > > -- > Vincenzo Maffione