From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: [PATCH 0/9][RFC] KVM virtio_net performance Date: Thu, 24 Jul 2008 15:56:01 -0500 Message-ID: <4888EC61.8050208@codemonkey.ws> References: <1216899979-32532-1-git-send-email-markmc@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org, Herbert Xu , Rusty Russell To: Mark McLoughlin Return-path: Received: from wr-out-0506.google.com ([64.233.184.238]:60498 "EHLO wr-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751500AbYGXU4g (ORCPT ); Thu, 24 Jul 2008 16:56:36 -0400 Received: by wr-out-0506.google.com with SMTP id 69so1972951wri.5 for ; Thu, 24 Jul 2008 13:56:35 -0700 (PDT) In-Reply-To: <1216899979-32532-1-git-send-email-markmc@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Hi Mark, Mark McLoughlin wrote: > Hey, > Here's a bunch of patches attempting to improve the performance > of virtio_net. This is more an RFC rather than a patch submission > since, as can be seen below, not all patches actually improve the > perfomance measurably. > I'm still seeing the same problem I saw with my patch series. Namely, dhclient fails to get a DHCP address. Rusty noticed that RX has a lot more packets received then it should so we're suspicious that we're getting packet corruption. Configuring the tap device with a static address, here's what I get with iperf: w/o patches: guest->host: 625 Mbits/sec host->guest: 825 Mbits/sec w/patches guest->host: 2.02 Gbits/sec host->guest: 1.89 Gbits/sec guest lo: 4.35 Gbits/sec host lo: 4.36 Gbits/sec This is with KVM GUEST configured FWIW. Regards, Anthony Liguori > I've tried hard to test each of these patches with as stable and > informative a benchmark as I could find. The first benchmark is a > netperf[1] based throughput benchmark and the second uses a flood > ping[2] to measure latency differences. > > Each set of figures is min/average/max/standard deviation. The > first set is Gb/s and the second is milliseconds. > > The network configuration used was very simple - the guest with > a virtio_net interface and the host with a tap interface and static > IP addresses assigned to both - e.g. there was no bridge in the host > involved and iptables was disable in both the host and guest. > > I used: > > 1) kvm-71-26-g6152996 with the patches that follow > > 2) Linus's v2.6.26-5752-g93ded9b with Rusty's virtio patches from > 219:bbd2611289c5 applied; these are the patches have just been > submitted to Linus > > The conclusions I draw are: > > 1) The length of the tx mitigation timer makes quite a difference to > throughput achieved; we probably need a good heuristic for > adjusting this on the fly. > > 2) Using the recently merged GSO support in the tun/tap driver gives > a huge boost, but much more so on the host->guest side. > > 3) Adjusting the virtio_net ring sizes makes a small difference, but > not as much as one might expect > > 4) Dropping the global mutex while reading GSO packets from the tap > interface gives a nice speedup. This highlights the global mutex > as a general perfomance issue. > > 5) Eliminating an extra copy on the host->guest path only makes a > barely measurable difference. > > Anyway, the figures: > > netperf, 10x20s runs (Gb/s) | guest->host | host->guest > -----------------------------+----------------------------+--------------------------- > baseline | 1.520/ 1.573/ 1.610/ 0.034 | 1.160/ 1.357/ 1.630/ 0.165 > 50us tx timer + rearm | 1.050/ 1.086/ 1.110/ 0.017 | 1.710/ 1.832/ 1.960/ 0.092 > 250us tx timer + rearm | 1.700/ 1.764/ 1.880/ 0.064 | 0.900/ 1.203/ 1.580/ 0.205 > 150us tx timer + rearm | 1.520/ 1.602/ 1.690/ 0.044 | 1.670/ 1.928/ 2.150/ 0.141 > no ring-full heuristic | 1.480/ 1.569/ 1.710/ 0.066 | 1.610/ 1.857/ 2.140/ 0.153 > VIRTIO_F_NOTIFY_ON_EMPTY | 1.470/ 1.554/ 1.650/ 0.054 | 1.770/ 1.960/ 2.170/ 0.119 > recv NO_NOTIFY | 1.530/ 1.604/ 1.680/ 0.047 | 1.780/ 1.944/ 2.190/ 0.129 > GSO | 4.120/ 4.323/ 4.420/ 0.099 | 6.540/ 7.033/ 7.340/ 0.244 > ring size == 256 | 4.050/ 4.406/ 4.560/ 0.143 | 6.280/ 7.236/ 8.280/ 0.613 > ring size == 512 | 4.420/ 4.600/ 4.960/ 0.140 | 6.470/ 7.205/ 7.510/ 0.314 > drop mutex during tapfd read | 4.320/ 4.578/ 4.790/ 0.161 | 8.370/ 8.589/ 8.730/ 0.120 > aligouri zero-copy | 4.510/ 4.694/ 4.960/ 0.148 | 8.430/ 8.614/ 8.840/ 0.142 > > ping -f -c 100000 (ms) | guest->host | host->guest > -----------------------------+----------------------------+--------------------------- > baseline | 0.060/ 0.459/ 7.602/ 0.846 | 0.067/ 0.331/ 2.517/ 0.057 > 50us tx timer + rearm | 0.081/ 0.143/ 7.436/ 0.374 | 0.093/ 0.133/ 1.883/ 0.026 > 250us tx timer + rearm | 0.302/ 0.463/ 7.580/ 0.849 | 0.297/ 0.344/ 2.128/ 0.028 > 150us tx timer + rearm | 0.197/ 0.323/ 7.671/ 0.740 | 0.199/ 0.245/ 7.836/ 0.037 > no ring-full heuristic | 0.182/ 0.324/ 7.688/ 0.753 | 0.199/ 0.243/ 2.197/ 0.030 > VIRTIO_F_NOTIFY_ON_EMPTY | 0.197/ 0.321/ 7.447/ 0.730 | 0.196/ 0.242/ 2.218/ 0.032 > recv NO_NOTIFY | 0.186/ 0.321/ 7.520/ 0.732 | 0.200/ 0.233/ 2.216/ 0.028 > GSO | 0.178/ 0.324/ 7.667/ 0.736 | 0.147/ 0.246/ 1.361/ 0.024 > ring size == 256 | 0.184/ 0.323/ 7.674/ 0.728 | 0.199/ 0.243/ 2.181/ 0.028 > ring size == 512 | (not measured) | (not measured) > drop mutex during tapfd read | 0.183/ 0.323/ 7.820/ 0.733 | 0.202/ 0.242/ 2.219/ 0.027 > aligouri zero-copy | 0.185/ 0.325/ 7.863/ 0.736 | 0.202/ 0.245/ 7.844/ 0.036 > > Cheers, > Mark. > > [1] - I used netperf trunk from: > > http://www.netperf.org/svn/netperf2/trunk > > and simply ran: > > $> i=0; while [ $i -lt 10 ]; do ./netperf -H -f g -l 20 -P 0 | netperf-collect.py; i=$((i+1)); done > > where netperf-collect.py is just a script to calculate the > average across the runs: > > http://markmc.fedorapeople.org/netperf-collect.py > > [2] - ping -c 100000 -f > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >