From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43168)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1a99qy-0000rp-1Y
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 06:02:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1a99qu-0004DT-R4
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 06:02:39 -0500
Received: from mx1.redhat.com ([209.132.183.28]:60057)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1a99qu-0004DE-JR
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 06:02:36 -0500
Date: Wed, 16 Dec 2015 13:02:31 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20151216124045-mutt-send-email-mst@redhat.com>
References: <cover.1450218353.git.v.maffione@gmail.com>
	<5671230C.70102@redhat.com>
	<CA+_eA9hp4dr14aM0TaPCrNks-K8ik-Ma+bn74N-1XAipR69wbQ@mail.gmail.com>
	<5671300E.5060109@redhat.com>
	<CA+_eA9i8K7BCzm6Cw109T_D1wozCcXAU=612z2ZXRGcovSRbDg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+_eA9i8K7BCzm6Cw109T_D1wozCcXAU=612z2ZXRGcovSRbDg@mail.gmail.com>
Subject: Re: [Qemu-devel] [PATCH v2 0/3] virtio: proposal to optimize
	accesses to VQs
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Vincenzo Maffione <v.maffione@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>, Markus Armbruster <armbru@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, Paolo Bonzini <pbonzini@redhat.com>, Giuseppe Lettieri <g.lettieri@iet.unipi.it>, Luigi Rizzo <rizzo@iet.unipi.it>

On Wed, Dec 16, 2015 at 11:39:46AM +0100, Vincenzo Maffione wrote:
> 2015-12-16 10:34 GMT+01:00 Paolo Bonzini <pbonzini@redhat.com>:
> >
> >
> > On 16/12/2015 10:28, Vincenzo Maffione wrote:
> >> Assuming my TX experiments with disconnected backend (and I disable
> >> CPU dynamic scaling of performance, etc.):
> >>   1) after patch 1 and 2, virtio bottleneck jumps from ~1Mpps to 1.910 Mpps.
> >>   2) after patch 1,2 and 3, virtio bottleneck jumps to 2.039 Mpps.
> >>
> >> So I see an improvement for patch 3, and I guess it's because we avoid
> >> an additional memory translation and related overhead. I believe that
> >> avoiding the memory translation is more beneficial than avoiding the
> >> variable-sized memcpy.
> >> I'm not surprised of that, because taking a brief look at what happens
> >> under the hood when you call an access_memory() function - it looks
> >> like a lot of operations.
> >
> > Great, thanks for confirming!
> >
> > Paolo
> 
> No problems.
> 
> I have some additional (orthogonal) curiosities:
> 
>   1) Assuming "hw/virtio/dataplane/vring.c" is what I think it is (VQ
> data structures directly accessible in the host virtual memory, with
> guest-phyisical-to-host-virtual mapping done statically at setup time)
> why isn't QEMU using this approach also for virtio-net? I see it is
> used by virtio-blk only.

Because on Linux, nothing would be gained as compared to using vhost-net
in kernel or vhost-user with dpdk.  virtio-net is there for non-Linux
hosts, keeping it simple is important to avoid e.g. security problems.
Same as serial, etc.

>   2) In any case (vring or not) QEMU dynamically maps data buffers
> from guest physical memory, for each descriptor to be processed: e1000
> uses pci_dma_read/pci_dma_write, virtio uses
> cpu_physical_memory_map()/cpu_physical_memory_unmap(), vring uses the
> more specialied vring_map()/vring_unmap(). All of these go through
> expensive lookups and related operations to do the address
> translation.
> Have you considered the possibility to cache the translation result to
> remove this bottleneck (maybe just for virtio devices)? Or is any
> consistency or migration-related problem that would create issues?
> Just to give an example of what I'm talking about:
> https://github.com/vmaffione/qemu/blob/master/hw/net/e1000.c#L349-L423.
> 
> At very high packet rates, once notifications (kicks and interrupts)
> have been amortized in some way, memory translation becomes the major
> bottleneck. And this (1 and 2) is why QEMU virtio implementation
> cannot achieve the same throughput as bhyve does (5-6 Mpps or more
> IIRC).
> 
> Cheers,
>   Vincenzo
> 
> 
> 
> -- 
> Vincenzo Maffione