qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC] virtio: proposal to optimize accesses to VQs
@ 2015-12-14 14:51 Vincenzo Maffione
  2015-12-14 14:51 ` [Qemu-devel] [PATCH RFC] virtio: optimize access to guest physical memory Vincenzo Maffione
  2015-12-14 16:06 ` [Qemu-devel] [PATCH RFC] virtio: proposal to optimize accesses to VQs Paolo Bonzini
  0 siblings, 2 replies; 6+ messages in thread
From: Vincenzo Maffione @ 2015-12-14 14:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst, jasowang, armbru, Vincenzo Maffione, g.lettieri, rizzo

Hi,
  I am doing performance experiments to test how QEMU behaves when the
guest is transmitting (short) network packets at very high packet rates, say
over 1Mpps.
I run a netmap application in the guest to generate high packet rates,
but this is not relevant to this discussion. The only important fact is that
the generator running in the guest is not the bottleneck, and in fact CPU
utilization is low (20%).

Moreover, I'm not considering vhost-net to boost virtio-net HV-side processing,
because I want to do performance unit-tests on the QEMU virtio userspace
implementation (hw/virtio/virtio.c).

In the most common benchmarks - e.g. netperf TCP_STREAM, TCP_RR,
UDP_STREAM, ..., with one end of the communication in the guest, and the
other in the host, for instance with the simplest TAP networking setup - the
virtio-net adapter definitely outperforms the emulated e1000 adapter (and
all the other emulated devices). This was expected because of the great
benefits of I/O paravirtualization.

However, I was surprised to find out that the situation changes drastically
at very high packet rates.

My measurements show that e1000 emulated adapter is able to transmit over
3.5 Mpps when the network backend is disconnected. I disconnect the
backend to see at what packet rate e1000 becomes the bottleneck.

The same experiment, however, shows that virtio-net has a bottleneck at
1Mpps only. Once verified that the TX VQ kicks and TX VQ interrupts are
properly amortized/suppressed, I found out that the bottleneck is partially
due to the way the code accesses the VQ in the guest physical memory, since
each access involves an expensive address space translation. For each VQ
element to process, I counted over 15 accesses, while e1000 has just 2 accesses
to its rings.

This patch slightly rewrites the code to reduce the number of accesses, since
many of them seems unnecessary to me. After this reduction, the bottleneck
jumps from 1 Mpps to 2 Mpps.

Patch is not complete (e.g. it still does not properly manage endianess, it is
not clean, etc.). I just wanted to ask if you think the idea makes sense, and
a proper patch in this direction would be accepted.

Thanks,
  Vincenzo

Vincenzo Maffione (1):
  virtio: optimize access to guest physical memory

 hw/virtio/virtio.c | 118 +++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 88 insertions(+), 30 deletions(-)

-- 
2.6.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-15 10:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-14 14:51 [Qemu-devel] [PATCH RFC] virtio: proposal to optimize accesses to VQs Vincenzo Maffione
2015-12-14 14:51 ` [Qemu-devel] [PATCH RFC] virtio: optimize access to guest physical memory Vincenzo Maffione
2015-12-14 16:06 ` [Qemu-devel] [PATCH RFC] virtio: proposal to optimize accesses to VQs Paolo Bonzini
2015-12-14 17:16   ` Vincenzo Maffione
2015-12-15  9:22   ` Vincenzo Maffione
2015-12-15 10:34     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).