All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Shirley Ma <mashirle@us.ibm.com>
Cc: David Miller <davem@davemloft.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Avi Kivity <avi@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
	netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V7 0/4 net-next] macvtap/vhost TX zero-copy support
Date: Tue, 14 Jun 2011 16:30:07 +0300	[thread overview]
Message-ID: <20110614133006.GA13258@redhat.com> (raw)
In-Reply-To: <1306610077.5180.78.camel@localhost.localdomain>

On Sat, May 28, 2011 at 12:14:37PM -0700, Shirley Ma wrote:
> This patchset add supports for TX zero-copy between guest and host
> kernel through vhost. It significantly reduces CPU utilization on the
> local host on which the guest is located (It reduced about 50% CPU usage
> for single stream test on the host, while 4K message size BW has
> increased about 50%). The patchset is based on previous submission and
> comments from the community regarding when/how to handle guest kernel
> buffers to be released. This is the simplest approach I can think of
> after comparing with several other solutions.
> 
> This patchset has integrated V3 review comments from community: 
> 
> 1. Add more comments on how to use device ZEROCOPY flag;
> 2. Change device ZEROCOPY to available bit 31
> 3. Fix skb header linear allocation when virtio_net GSO is not enabled
> 
> It has integrated V4 review comments from MST and Sridhar:
> 1. In vhost, using socket poll wake up for outstanding DMAs
> 2. Add detailed comments for vhost_zerocopy_signal_used call
> 3. Add sleep in vhost shutting down instead of busy-wait for outstanding
>    DMAs.
> 4. Copy small packets, don't do zero-copy callback in mavtap, mark it's
>    DMA done in vhost
> 5. change zerocopy to bool in macvtap.
> 
> It has integrated V5 review comments from MST and 
> Michał Mirosław <mirqus@gmail.com>
> 1. Prevent userspace apps from holding skb userspace buffers by copying
> userspace buffers to kernel in skb_clone, skb_copy, pskb_copy,
> pskb_expand_head.
> 2. It is also used HIGHDMA, SG feature bits to enable ZEROCOPY to remove
> the dependency of a new feature bit, we can add it later when new
> feature bit is available.
> 
> It has integrated V6 review comments from Eric Dumazet.
> 1. Moving ubuf_info object from skb to caller, just use one pointer in
> skb_share_info to point ubuf_info object.
> 
> 2. Change the zero-copy size from 256 bytes to PAGE_SIZE (4K) because of
> the small message size performance issue.
> 
> 3. During vhost shutting down, release outstanding userspace buffers w/o
> waiting for lower device DMAs done if any. Do we really care about the
> possible wrong data being sent on the wire during shutting down?

Yes, we do. IMHO the right approach is to wait
for DMAs to be done - just use a sleep/wake up construct,
not a timed sleep.

> This patchset includes:
> 1/4: Add a new sock zero-copy flag, SOCK_ZEROCOPY;
> 
> 2/4: Add a new struct skb_ubuf_info in skb_share_info for userspace
> buffers release callback when lower device DMA has done for that skb,
> which is the last reference count gone; Or whenever skb_clone, skb_copy,
> pskb_copy, pskb_expand_head get call from tcpdump, filtering, these
> userspace buffers will be copied into kernel ... we don't want userspace
> apps to hold userspace buffers too long.
> 
> 3/4: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
> add vhost_zerocopy_signal_used to notify guest to release TX skb
> buffers.
> 
> 4/4: Add macvtap zero-copy in lower device when sending packet is
> greater than PAGE_SIZE.
> 
> The patchset is built against linux-2.6.39. It has passed
> netperf/netserver multiple streams stress test, tcpdump
> suspended test, dynamically SG change test.
> 
> Single TCP_STREAM 120 secs test results 2.6.39-rc3 over ixgbe 10Gb NIC
> results:
> 
> Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s
> 4K      7408.57         92.1%           22.6%           1229
> 4K(Orig)4913.17         118.1%          84.1%           2086    
> 8K      9129.90         89.3%           23.3%           1141
> 8K(Orig)7094.55         115.9%          84.7%           2157
> 16K     9178.81         89.1%           23.3%           1139
> 16K(Orig)8927.1         118.7%          83.4%           2262
> 64K     9171.43         88.4%           24.9%           1253
> 64K(Orig)9085.85        115.9%          82.4%           2229
> 
> For message size less or equal than 2K, there is a known KVM guest TX
> overrun issue. With this zero-copy patch, the issue becomes more severe,
> guest io_exits has tripled than before, so the performance is not good.
> Once the TX overrun problem has been addressed, I will retest the small
> message size performance.
> 
>  drivers/net/macvtap.c  |  131
> ++++++++++++++++++++++++++++++++++++++++++++----
>  drivers/vhost/net.c    |   45 ++++++++++++++++-
>  drivers/vhost/vhost.c  |   51 +++++++++++++++++++
>  drivers/vhost/vhost.h  |   15 ++++++
>  include/linux/skbuff.h |   25 +++++++++
>  include/net/sock.h     |    1 +
>  net/core/skbuff.c      |   83 ++++++++++++++++++++++++++++++-
>  7 files changed, 338 insertions(+), 13 deletions(-)
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2011-06-14 13:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-28 19:14 [PATCH V7 0/4 net-next] macvtap/vhost TX zero-copy support Shirley Ma
2011-06-14 13:30 ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110614133006.GA13258@redhat.com \
    --to=mst@redhat.com \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mashirle@us.ibm.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.