Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jason Wang <jasowang@redhat.com>
To: mst@redhat.com, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net
Date: Tue, 3 Nov 2015 15:46:09 +0800	[thread overview]
Message-ID: <56386641.30402@redhat.com> (raw)
In-Reply-To: <56335B66.9050705@redhat.com>



On 10/30/2015 07:58 PM, Jason Wang wrote:
>
> On 10/29/2015 04:45 PM, Jason Wang wrote:
>> Hi all:
>>
>> This series tries to add basic busy polling for vhost net. The idea is
>> simple: at the end of tx processing, busy polling for new tx added
>> descriptor and rx receive socket for a while. The maximum number of
>> time (in us) could be spent on busy polling was specified through
>> module parameter.
>>
>> Test were done through:
>>
>> - 50 us as busy loop timeout
>> - Netperf 2.6
>> - Two machines with back to back connected mlx4
>> - Guest with 8 vcpus and 1 queue
>>
>> Result shows very huge improvement on both tx (at most 158%) and rr
>> (at most 53%) while rx is as much as in the past. Most cases the cpu
>> utilization is also improved:
>>
> Just notice there's something wrong in the setup. So the numbers are
> incorrect here. Will re-run and post correct number here.
>
> Sorry.

Here's the updated testing result:

1) 1 vcpu 1 queue:

TCP_RR
size/session/+thu%/+normalize%
    1/     1/    0%/  -25%
    1/    50/  +12%/    0%
    1/   100/  +12%/   +1%
    1/   200/   +9%/   -1%
   64/     1/   +3%/  -21%
   64/    50/   +8%/    0%
   64/   100/   +7%/    0%
   64/   200/   +9%/    0%
  256/     1/   +1%/  -25%
  256/    50/   +7%/   -2%
  256/   100/   +6%/   -2%
  256/   200/   +4%/   -2%
  512/     1/   +2%/  -19%
  512/    50/   +5%/   -2%
  512/   100/   +3%/   -3%
  512/   200/   +6%/   -2%
 1024/     1/   +2%/  -20%
 1024/    50/   +3%/   -3%
 1024/   100/   +5%/   -3%
 1024/   200/   +4%/   -2%
Guest RX
size/session/+thu%/+normalize%
   64/     1/   -4%/   -5%
   64/     4/   -3%/  -10%
   64/     8/   -3%/   -5%
  512/     1/  +15%/   +1%
  512/     4/   -5%/   -5%
  512/     8/   -2%/   -4%
 1024/     1/   -5%/  -16%
 1024/     4/   -2%/   -5%
 1024/     8/   -6%/   -6%
 2048/     1/  +10%/   +5%
 2048/     4/   -8%/   -4%
 2048/     8/   -1%/   -4%
 4096/     1/   -9%/  -11%
 4096/     4/   +1%/   -1%
 4096/     8/   +1%/    0%
16384/     1/  +20%/  +11%
16384/     4/    0%/   -3%
16384/     8/   +1%/    0%
65535/     1/  +36%/  +13%
65535/     4/  -10%/   -9%
65535/     8/   -3%/   -2%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   -7%/  -16%
   64/     4/  -14%/  -23%
   64/     8/   -9%/  -20%
  512/     1/  -62%/  -56%
  512/     4/  -62%/  -56%
  512/     8/  -61%/  -53%
 1024/     1/  -66%/  -61%
 1024/     4/  -77%/  -73%
 1024/     8/  -73%/  -67%
 2048/     1/  -74%/  -75%
 2048/     4/  -77%/  -74%
 2048/     8/  -72%/  -68%
 4096/     1/  -65%/  -68%
 4096/     4/  -66%/  -63%
 4096/     8/  -62%/  -57%
16384/     1/  -25%/  -28%
16384/     4/  -28%/  -17%
16384/     8/  -24%/  -10%
65535/     1/  -17%/  -14%
65535/     4/  -22%/   -5%
65535/     8/  -25%/   -9%

- obvious improvement on TCP_RR (at most 12%)
- improvement on guest RX
- huge decreasing on Guest TX (at most -75%), this is probably because
virtio-net driver suffers from buffer bloat by orphaning skb before
transmission. The faster vhost it is, the smaller packet it could
produced. To reduce the impact on this, turning off gso in guest can
result the following result:

size/session/+thu%/+normalize%
   64/     1/   +3%/  -11%
   64/     4/   +4%/  -10%
   64/     8/   +4%/  -10%
  512/     1/   +2%/   +5%
  512/     4/    0%/   -1%
  512/     8/    0%/    0%
 1024/     1/  +11%/    0%
 1024/     4/    0%/   -1%
 1024/     8/   +3%/   +1%
 2048/     1/   +4%/   -1%
 2048/     4/   +8%/   +3%
 2048/     8/    0%/   -1%
 4096/     1/   +4%/   -1%
 4096/     4/   +1%/    0%
 4096/     8/   +2%/    0%
16384/     1/   +2%/   -2%
16384/     4/   +3%/   +1%
16384/     8/    0%/   -1%
65535/     1/   +9%/   +7%
65535/     4/    0%/   -3%
65535/     8/   -1%/   -1%

2) 8 vcpus 1 queue:

TCP_RR
size/session/+thu%/+normalize%
    1/     1/   +5%/  -14%
    1/    50/   +2%/   +1%
    1/   100/    0%/   -1%
    1/   200/    0%/    0%
   64/     1/    0%/  -25%
   64/    50/   +5%/   +5%
   64/   100/    0%/    0%
   64/   200/    0%/   -1%
  256/     1/    0%/  -30%
  256/    50/    0%/    0%
  256/   100/   -2%/   -2%
  256/   200/    0%/    0%
  512/     1/   +1%/  -23%
  512/    50/   +1%/   +1%
  512/   100/   +1%/    0%
  512/   200/   +1%/   +1%
 1024/     1/   +1%/  -23%
 1024/    50/   +5%/   +5%
 1024/   100/    0%/   -1%
 1024/   200/    0%/    0%
Guest RX
size/session/+thu%/+normalize%
   64/     1/   +1%/   +1%
   64/     4/   -2%/   +1%
   64/     8/   +6%/  +19%
  512/     1/   +5%/   -7%
  512/     4/   -4%/   -4%
  512/     8/    0%/    0%
 1024/     1/   +1%/   +2%
 1024/     4/   -2%/   -2%
 1024/     8/   -1%/   +7%
 2048/     1/   +8%/   -2%
 2048/     4/    0%/   +5%
 2048/     8/   -1%/  +13%
 4096/     1/   -1%/   +2%
 4096/     4/    0%/   +6%
 4096/     8/   -2%/  +15%
16384/     1/   -1%/    0%
16384/     4/   -2%/   -1%
16384/     8/   -2%/   +2%
65535/     1/   -2%/    0%
65535/     4/   -3%/   -3%
65535/     8/   -2%/   +2%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   +6%/   +3%
   64/     4/  +11%/   +8%
   64/     8/    0%/    0%
  512/     1/  +19%/  +18%
  512/     4/   -4%/   +1%
  512/     8/   -1%/   -1%
 1024/     1/    0%/   +8%
 1024/     4/   -1%/   -1%
 1024/     8/    0%/   +1%
 2048/     1/   +1%/    0%
 2048/     4/   -1%/   -2%
 2048/     8/    0%/    0%
 4096/     1/  +12%/  +14%
 4096/     4/    0%/   -1%
 4096/     8/   -2%/   -1%
16384/     1/   +9%/   +6%
16384/     4/   +3%/   -1%
16384/     8/   +2%/   -1%
65535/     1/   +1%/   -2%
65535/     4/    0%/   -4%
65535/     8/    0%/   -2%

- latency get improved a little bit
- small improvement on single session rx
- no other obvious changes
- this may because 8 vcpu could give enough stress on a single vhost
thread. Then the busy polling was not trigged enough (unless on light
load case e.g 1 session TCP_RR).

3) 8 vcpus 8 queues

8 vcpu 8 queue
TCP_RR
size/session/+thu%/+normalize%
    1/     1/   +6%/  -16%
    1/    50/  +14%/   +1%
    1/   100/  +17%/   +3%
    1/   200/  +16%/   +2%
   64/     1/   +2%/  -19%
   64/    50/  +10%/    0%
   64/   100/  +17%/   +5%
   64/   200/  +15%/   +3%
  256/     1/    0%/  -19%
  256/    50/   +5%/   -3%
  256/   100/   +4%/   -3%
  256/   200/   +2%/   -4%
  512/     1/   +4%/  -19%
  512/    50/   +7%/   -2%
  512/   100/   +4%/   -4%
  512/   200/   +3%/   -4%
 1024/     1/   +9%/  -19%
 1024/    50/   +6%/   -2%
 1024/   100/   +5%/   -3%
 1024/   200/   +5%/   -3%
Guest RX
size/session/+thu%/+normalize%
   64/     1/  +18%/  +13%
   64/     4/    0%/   -1%
   64/     8/   -4%/  -11%
  512/     1/   +3%/   -6%
  512/     4/   +1%/  -11%
  512/     8/   -1%/   -7%
 1024/     1/    0%/   -9%
 1024/     4/   +9%/  -16%
 1024/     8/   -1%/  -11%
 2048/     1/    0%/   -2%
 2048/     4/    0%/  -16%
 2048/     8/   -1%/   -2%
 4096/     1/   +3%/    0%
 4096/     4/   -1%/  -12%
 4096/     8/    0%/   -5%
16384/     1/   -2%/   -6%
16384/     4/    0%/   -6%
16384/     8/    0%/   -6%
65535/     1/    0%/    0%
65535/     4/    0%/   -9%
65535/     8/    0%/   +1%
Guest TX
size/session/+thu%/+normalize%
   64/     1/   +7%/   +3%
   64/     4/   +6%/    0%
   64/     8/  +10%/   +5%
  512/     1/    0%/  +14%
  512/     4/   +9%/   -1%
  512/     8/  +14%/   +4%
 1024/     1/  +44%/  +37%
 1024/     4/   +6%/   +2%
 1024/     8/  +19%/  +12%
 2048/     1/  -14%/  -16%
 2048/     4/  +11%/   +8%
 2048/     8/  +26%/  +28%
 4096/     1/  +21%/  +19%
 4096/     4/   +2%/  +10%
 4096/     8/  +14%/   +7%
16384/     1/  +12%/   +4%
16384/     4/   +7%/   +2%
16384/     8/   +2%/   +9%
65535/     1/   -3%/   -5%
65535/     4/   +9%/   +5%
65535/     8/    0%/   -8%

- TCP_RR get obviously improved (at most 17%)
- obvious improvement on Guest TX (at most 44%)

     prev parent reply	other threads:[~2015-11-03  7:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-29  8:45 [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net Jason Wang
2015-10-29  8:45 ` [PATCH net-next rfc V2 1/2] vhost: introduce vhost_has_work() Jason Wang
2015-10-29  8:45 ` [PATCH net-next rfc V2 2/2] vhost_net: basic polling support Jason Wang
2015-10-30 11:58 ` [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net Jason Wang
2015-11-03  7:46   ` Jason Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56386641.30402@redhat.com \
    --to=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).