From mboxrd@z Thu Jan  1 00:00:00 1970
From: jiangyiwen <jiangyiwen@huawei.com>
Subject: [PATCH v2 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
Date: Wed, 12 Dec 2018 17:25:50 +0800
Message-ID: <5C10D41E.9050002@huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Cc: <netdev@vger.kernel.org>, <kvm@vger.kernel.org>,
        <virtualization@lists.linux-foundation.org>
To: Stefan Hajnoczi <stefanha@redhat.com>,
        "Michael S. Tsirkin" <mst@redhat.com>,
        Jason Wang <jasowang@redhat.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from szxga04-in.huawei.com ([45.249.212.190]:16558 "EHLO huawei.com"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1726680AbeLLJZz (ORCPT <rfc822;netdev@vger.kernel.org>);
        Wed, 12 Dec 2018 04:25:55 -0500
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Now vsock only support send/receive small packet, it can't achieve
high performance. As previous discussed with Jason Wang, I revisit the
idea of vhost-net about mergeable rx buffer and implement the mergeable
rx buffer in vhost-vsock, it can allow big packet to be scattered in
into different buffers and improve performance obviously.

This series of patches mainly did three things：
- mergeable buffer implementation
- increase the max send pkt size
- add used and signal guest in a batch

And I write a tool to test the vhost-vsock performance, mainly send big
packet(64K) included guest->Host and Host->Guest. I test performance
independently and the result as follows:

Before performance:
              Single socket            Multiple sockets(Max Bandwidth)
Guest->Host   ~400MB/s                 ~480MB/s
Host->Guest   ~1450MB/s                ~1600MB/s

After performance only use implement mergeable rx buffer:
              Single socket            Multiple sockets(Max Bandwidth)
Guest->Host   ~400MB/s                 ~480MB/s
Host->Guest   ~1280MB/s                ~1350MB/s

In this case, max send pkt size is still limited to 4K, so Host->Guest
performance will worse than before.

After performance increase the max send pkt size to 64K:
              Single socket            Multiple sockets(Max Bandwidth)
Guest->Host   ~1700MB/s                ~2900MB/s
Host->Guest   ~1500MB/s                ~2440MB/s

After performance all patches are used:
              Single socket            Multiple sockets(Max Bandwidth)
Guest->Host   ~1700MB/s                ~2900MB/s
Host->Guest   ~1700MB/s                ~2900MB/s

>>From the test results, the performance is improved obviously, and guest
memory will not be wasted.

In addition, in order to support mergeable rx buffer in virtio-vsock,
we need to add a qemu patch to support parse feature.

---
v1 -> v2:
 * Addressed comments from Jason Wang.
 * Add performance test result independently.
 * Use Skb_page_frag_refill() which can use high order page and reduce
   the stress of page allocator.
 * Still use fixed size(PAGE_SIZE) to fill rx buffer, because too small
   size can't fill one full packet, we only 128 vq num now.
 * Use iovec to replace buf in struct virtio_vsock_pkt, keep tx and rx
   consistency.
 * Add virtio_transport ops to get max pkt len, in order to be compatible
   with old version.
---

Yiwen Jiang (5):
  VSOCK: support fill mergeable rx buffer in guest
  VSOCK: support fill data to mergeable rx buffer in host
  VSOCK: support receive mergeable rx buffer in guest
  VSOCK: increase send pkt len in mergeable mode to improve performance
  VSOCK: batch sending rx buffer to increase bandwidth

 drivers/vhost/vsock.c                   | 183 ++++++++++++++++++++-----
 include/linux/virtio_vsock.h            |  13 +-
 include/uapi/linux/virtio_vsock.h       |   5 +
 net/vmw_vsock/virtio_transport.c        | 229 +++++++++++++++++++++++++++-----
 net/vmw_vsock/virtio_transport_common.c |  66 ++++++---
 5 files changed, 411 insertions(+), 85 deletions(-)

-- 
1.8.3.1