Network performance with small packets - continued

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Tom Lendacky <tahm@linux.vnet.ibm.com>
To: kvm@vger.kernel.org
Subject: Network performance with small packets - continued
Date: Mon, 7 Mar 2011 16:31:41 -0600	[thread overview]
Message-ID: <201103071631.41964.tahm@linux.vnet.ibm.com> (raw)

We've been doing some more experimenting with the small packet network 
performance problem in KVM.  I have a different setup than what Steve D. was 
using so I re-baselined things on the kvm.git kernel on both the host and 
guest with a 10GbE adapter.  I also made use of the virtio-stats patch.

The virtual machine has 2 vCPUs, 8GB of memory and two virtio network adapters 
(the first connected to a 1GbE adapter and a LAN, the second connected to a 
10GbE adapter that is direct connected to another system with the same 10GbE 
adapter) running the kvm.git kernel.  The test was a TCP_RR test with 100 
connections from a baremetal client to the KVM guest using a 256 byte message 
size in both directions.

I used the uperf tool to do this after verifying the results against netperf.  
Uperf allows the specification of the number of connections as a parameter in 
an XML file as opposed to launching, in this case, 100 separate instances of 
netperf.

Here is the baseline for baremetal using 2 physical CPUs:
  Txn Rate: 206,389.59 Txn/Sec, Pkt Rate: 410,048 Pkts/Sec
  TxCPU: 7.88%  RxCPU: 99.41%

To be sure to get consistent results with KVM I disabled the hyperthreads, 
pinned the qemu-kvm process, vCPUs, vhost thread and ethernet adapter 
interrupts (this resulted in runs that differed by only about 2% from lowest 
to highest).  The fact that pinning is required to get consistent results is a 
different problem that we'll have to look into later...

Here is the KVM baseline (average of six runs):
  Txn Rate: 87,070.34 Txn/Sec, Pkt Rate: 172,992 Pkts/Sec
  Exits: 148,444.58 Exits/Sec
  TxCPU: 2.40%  RxCPU: 99.35%
About 42% of baremetal.

The virtio stats output showed alot of kick_notify happening when the ring was 
empty.  So I coded a quick patch to delay freeing of the used Tx buffers until 
more than half the ring was used (I did not test this under a stream condition 
so I don't know if this would have a negative impact).  Here are the results 
from delaying the freeing of used Tx buffers (average of six runs):
  Txn Rate: 90,886.19 Txn/Sec, Pkt Rate: 180,571 Pkts/Sec
  Exits: 142,681.67 Exits/Sec
  TxCPU: 2.78%  RxCPU: 99.36%
About a 4% increase over baseline and about 44% of baremetal.

This spread out the kick_notify but still resulted in alot of them.  I decided 
to build on the delayed Tx buffer freeing and code up an "ethtool" like 
coalescing patch in order to delay the kick_notify until there were at least 5 
packets on the ring or 2000 usecs, whichever occurred first.  Here are the 
results of delaying the kick_notify (average of six runs):
  Txn Rate: 107,106.36 Txn/Sec, Pkt Rate: 212,796 Pkts/Sec
  Exits: 102,587.28 Exits/Sec
  TxCPU: 3.03%  RxCPU: 99.33%
About a 23% increase over baseline and about 52% of baremetal.

Running the perf command against the guest I noticed almost 19% of the time 
being spent in _raw_spin_lock.  Enabling lockstat in the guest showed alot of 
contention in the "irq_desc_lock_class". Pinning the virtio1-input interrupt 
to a single cpu in the guest and re-running the last test resulted in 
tremendous gains (average of six runs):
  Txn Rate: 153,696.59 Txn/Sec, Pkt Rate: 305,358 Pkgs/Sec
  Exits: 62,603.37 Exits/Sec
  TxCPU: 3.73%  RxCPU: 98.52%
About a 77% increase over baseline and about 74% of baremetal.

Vhost is receiving a lot of notifications for packets that are to be 
transmitted (over 60% of the packets generate a kick_notify).  Also, it looks 
like vhost is sending a lot of notifications for packets it has received 
before the guest can get scheduled to disable notifications and begin 
processing the packets resulting in some lock contention in the guest (and 
high interrupt rates).

Some thoughts for the transmit path...  can vhost be enhanced to do some 
adaptive polling so that the number of kick_notify events are reduced and 
replaced by kick_no_notify events?

Comparing the transmit path to the receive path, the guest disables 
notifications after the first kick and vhost re-enables notifications after 
completing processing of the tx ring.  Can a similar thing be done for the 
receive path?  Once vhost sends the first notification for a received packet 
it can disable notifications and let the guest re-enable notifications when it 
has finished processing the receive ring.  Also, can the virtio-net driver do 
some adaptive polling (or does napi take care of that for the guest)?

Running the same workload on the same configuration with a different 
hypervisor results in performance that is almost equivalent to baremetal 
without doing any pinning.

Thanks,
Tom Lendacky

next             reply	other threads:[~2011-03-07 22:31 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-07 22:31 Tom Lendacky [this message]
2011-03-09  2:34 ` Network performance with small packets - continued Chigurupati, Chaks
2011-03-09  7:15 ` Michael S. Tsirkin
2011-03-09 15:45   ` Shirley Ma
2011-03-09 16:10     ` Michael S. Tsirkin
2011-03-09 16:25       ` Shirley Ma
2011-03-09 16:32         ` Michael S. Tsirkin
2011-03-09 16:38           ` Shirley Ma
2011-03-09 16:09   ` Tom Lendacky
2011-03-09 16:21     ` Shirley Ma
2011-03-09 16:28     ` Michael S. Tsirkin
2011-03-09 16:51     ` Shirley Ma
2011-03-09 17:16       ` Michael S. Tsirkin
2011-03-09 18:16         ` Shirley Ma
2011-03-09 22:51     ` Tom Lendacky
2011-03-09 20:11   ` Tom Lendacky
2011-03-09 21:56     ` Michael S. Tsirkin
2011-03-09 23:25       ` Tom Lendacky
2011-03-10  6:54         ` Michael S. Tsirkin
2011-03-10 15:23           ` Tom Lendacky
2011-03-10 15:34             ` Michael S. Tsirkin
2011-03-10 17:16               ` Tom Lendacky
2011-03-18 15:38                 ` Tom Lendacky
2011-03-10  0:59       ` Shirley Ma
2011-03-10  2:30         ` Rick Jones
2011-03-09 22:45     ` Shirley Ma
2011-03-09 22:57       ` Tom Lendacky
2011-03-09  7:17 ` Michael S. Tsirkin
2011-03-09 16:17   ` Tom Lendacky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201103071631.41964.tahm@linux.vnet.ibm.com \
    --to=tahm@linux.vnet.ibm.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox