Linux virtualization list
 help / color / mirror / Atom feed
* [PATCH] drm/qxl: use ttm_tt
From: Gerd Hoffmann @ 2018-11-07  7:05 UTC (permalink / raw)
  To: dri-devel
  Cc: David Airlie, Dave Airlie, open list,
	open list:DRM DRIVER FOR QXL VIRTUAL GPU

qxl device will not dma, so we don't need ttm_dma_tt.  Go use ttm_tt
instead, to avoid wasting resources (swiotlb bounce buffers for
example).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/gpu/drm/qxl/qxl_ttm.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 559a101138..7a778a46a2 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -252,7 +252,7 @@ static void qxl_ttm_io_mem_free(struct ttm_bo_device *bdev,
  * TTM backend functions.
  */
 struct qxl_ttm_tt {
-	struct ttm_dma_tt		ttm;
+	struct ttm_tt		        ttm;
 	struct qxl_device		*qdev;
 	u64				offset;
 };
@@ -281,7 +281,7 @@ static void qxl_ttm_backend_destroy(struct ttm_tt *ttm)
 {
 	struct qxl_ttm_tt *gtt = (void *)ttm;
 
-	ttm_dma_tt_fini(&gtt->ttm);
+	ttm_tt_fini(&gtt->ttm);
 	kfree(gtt);
 }
 
@@ -301,13 +301,13 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo,
 	gtt = kzalloc(sizeof(struct qxl_ttm_tt), GFP_KERNEL);
 	if (gtt == NULL)
 		return NULL;
-	gtt->ttm.ttm.func = &qxl_backend_func;
+	gtt->ttm.func = &qxl_backend_func;
 	gtt->qdev = qdev;
-	if (ttm_dma_tt_init(&gtt->ttm, bo, page_flags)) {
+	if (ttm_tt_init(&gtt->ttm, bo, page_flags)) {
 		kfree(gtt);
 		return NULL;
 	}
-	return &gtt->ttm.ttm;
+	return &gtt->ttm;
 }
 
 static void qxl_move_null(struct ttm_buffer_object *bo,
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH 3/5] VSOCK: support receive mergeable rx buffer in guest
From: Jason Wang @ 2018-11-07  6:20 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE137B2.5040305@huawei.com>


On 2018/11/6 下午2:41, jiangyiwen wrote:
> On 2018/11/6 12:00, Jason Wang wrote:
>> On 2018/11/5 下午3:47, jiangyiwen wrote:
>>> Guest receive mergeable rx buffer, it can merge
>>> scatter rx buffer into a big buffer and then copy
>>> to user space.
>>>
>>> Signed-off-by: Yiwen Jiang<jiangyiwen@huawei.com>
>>> ---
>>>    include/linux/virtio_vsock.h            |  9 ++++
>>>    net/vmw_vsock/virtio_transport.c        | 75 +++++++++++++++++++++++++++++----
>>>    net/vmw_vsock/virtio_transport_common.c | 59 ++++++++++++++++++++++----
>>>    3 files changed, 127 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index da9e1fe..6be3cd7 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -13,6 +13,8 @@
>>>    #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE    (1024 * 4)
>>>    #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>>    #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>>> +/* virtio_vsock_pkt + max_pkt_len(default MAX_PKT_BUF_SIZE) */
>>> +#define VIRTIO_VSOCK_MAX_MRG_BUF_NUM ((VIRTIO_VSOCK_MAX_PKT_BUF_SIZE / PAGE_SIZE) + 1)
>>>
>>>    /* Virtio-vsock feature */
>>>    #define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
>>> @@ -48,6 +50,11 @@ struct virtio_vsock_sock {
>>>        struct list_head rx_queue;
>>>    };
>>>
>>> +struct virtio_vsock_mrg_rxbuf {
>>> +    void *buf;
>>> +    u32 len;
>>> +};
>>> +
>>>    struct virtio_vsock_pkt {
>>>        struct virtio_vsock_hdr    hdr;
>>>        struct virtio_vsock_mrg_rxbuf_hdr mrg_rxbuf_hdr;
>>> @@ -59,6 +66,8 @@ struct virtio_vsock_pkt {
>>>        u32 len;
>>>        u32 off;
>>>        bool reply;
>>> +    bool mergeable;
>>> +    struct virtio_vsock_mrg_rxbuf mrg_rxbuf[VIRTIO_VSOCK_MAX_MRG_BUF_NUM];
>>>    };
>> It's better to use iov here I think, and drop buf completely.
>>
>> And this is better to be done in an independent patch.
>>
> You're right, I can use kvec instead of customized structure,
> in addition, I don't understand about drop buf completely and
> an independent patch.


I mean there a void *buf in struct virtio_vsock_pkt. You can drop it and 
switch to use iov(iter) or other data structure that supports sg.

Thanks


>
> Thanks.
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/5] VSOCK: support fill data to mergeable rx buffer in host
From: Jason Wang @ 2018-11-07  6:18 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE134EF.1070009@huawei.com>


On 2018/11/6 下午2:30, jiangyiwen wrote:
>> Seems duplicated with the one used by vhost-net.
>>
>> In packed virtqueue implementation, I plan to move this to vhost.c.
>>
> Yes, this code is full copied from vhost-net, if it can be packed into
> vhost.c, it would be great.
>

If you try to reuse vhost-net, you don't even need to care about this :)

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/5] VSOCK: support fill mergeable rx buffer in guest
From: Jason Wang @ 2018-11-07  6:17 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE13331.7050901@huawei.com>


On 2018/11/6 下午2:22, jiangyiwen wrote:
> On 2018/11/6 11:38, Jason Wang wrote:
>> On 2018/11/5 下午3:45, jiangyiwen wrote:
>>> In driver probing, if virtio has VIRTIO_VSOCK_F_MRG_RXBUF feature,
>>> it will fill mergeable rx buffer, support for host send mergeable
>>> rx buffer. It will fill a page everytime to compact with small
>>> packet and big packet.
>>>
>>> Signed-off-by: Yiwen Jiang<jiangyiwen@huawei.com>
>>> ---
>>>    include/linux/virtio_vsock.h     |  3 ++
>>>    net/vmw_vsock/virtio_transport.c | 72 +++++++++++++++++++++++++++++-----------
>>>    2 files changed, 56 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>>> index e223e26..bf84418 100644
>>> --- a/include/linux/virtio_vsock.h
>>> +++ b/include/linux/virtio_vsock.h
>>> @@ -14,6 +14,9 @@
>>>    #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>>    #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>>>
>>> +/* Virtio-vsock feature */
>>> +#define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
>>> +
>>>    enum {
>>>        VSOCK_VQ_RX     = 0, /* for host to guest data */
>>>        VSOCK_VQ_TX     = 1, /* for guest to host data */
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index 5d3cce9..2040a9e 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -64,6 +64,7 @@ struct virtio_vsock {
>>>        struct virtio_vsock_event event_list[8];
>>>
>>>        u32 guest_cid;
>>> +    bool mergeable;
>>>    };
>>>
>>>    static struct virtio_vsock *virtio_vsock_get(void)
>>> @@ -256,6 +257,25 @@ static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
>>>        return 0;
>>>    }
>>>
>>> +static int fill_mergeable_rx_buff(struct virtqueue *vq)
>>> +{
>>> +    void *page = NULL;
>>> +    struct scatterlist sg;
>>> +    int err;
>>> +
>>> +    page = (void *)get_zeroed_page(GFP_KERNEL);
>> Any reason to use zeroed page?
> In previous version, the entire structure of virtio_vsock_pkt is preallocated
> in guest use kzalloc, it is a contiguous zeroed physical memory, but host only
> need to fill virtio_vsock_hdr size.
>
> However, in mergeable rx buffer version, we only fill a page in vring descriptor
> in guest, and I will reserve size of virtio_vsock_pkt in host instead of write
> the total size of virtio_vsock_pkt, for the correctness of structure value,
> we should set zeroed page in advance.


I may miss something, but it looks to me only the header needs to be zeroed.


>
>>> +    if (!page)
>>> +        return -ENOMEM;
>>> +
>>> +    sg_init_one(&sg, page, PAGE_SIZE);
>> FYI, for virtio-net we have several optimizations for mergeable rx buffer:
>>
>> - skb_page_frag_refill() which can use high order page and reduce the stress of page allocator
>>
> You're right, initially I want to use a memory poll to manage the rx buffer,
> and then use this in the later optimized patch. Your advice is very great.
>
>> - we don't use fixed buffer size, instead we use EWMA to estimate the possible rx buffer size to avoid internal fragmentation
>>
> Ok, I analysis the feature and consider add it into my patches.
>
>> If we can try to reuse virtio-net driver, we will get those nice features.
>>
> Yes, after all virtio-net has a very good ecological environment, and it also
> do many performance optimization, it is actually a good idea.
>

Yes, so my suggestion is to consider to reuse them (unless we found 
something that is a real blocker) instead of duplicating codes, features 
a bugs.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/1] vhost: add vhost_blk driver
From: Paolo Bonzini @ 2018-11-06 21:41 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Wang
  Cc: Vitaly Mayatskih, kvm, Michael S . Tsirkin, linux-kernel,
	virtualization, stefanha
In-Reply-To: <20181106071349.GA5526@infradead.org>

On 06/11/2018 08:13, Christoph Hellwig wrote:
> On Tue, Nov 06, 2018 at 10:45:08AM +0800, Jason Wang wrote:
>>> Storage industry is shifting away from SCSI, which has a scaling
>>> problem.
>>
>> Know little about storage. For scaling, do you mean SCSI protocol itself? If
>> not, it's probably not a real issue for virtio-scsi itself.
> 
> The above is utter bullshit.  There is a big NVMe hype, but it is not
> because "SCSI has a scaling problem", but because the industry can sell
> a new thing, and the standardization body seems easier to work with.

In the case of Linux, there is indeed some performance to be gained by
skipping the SCSI layer in the guest and to a lesser extent in QEMU,
which is why virtio-blk is still in use.  But there's no scaling
problem, it's just extra constant overhead.

Paolo

^ permalink raw reply

* Re: [PATCH 1/1] Add vhost_blk driver
From: Stefan Hajnoczi @ 2018-11-06 16:03 UTC (permalink / raw)
  To: Vitaly Mayatskikh
  Cc: kvm, Michael S . Tsirkin, linux-kernel, virtualization,
	Paolo Bonzini
In-Reply-To: <20181102182123.29420-2-v.mayatskih@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3004 bytes --]

On Fri, Nov 02, 2018 at 06:21:23PM +0000, Vitaly Mayatskikh wrote:
> This driver accelerates host side of virtio-blk.

Did you look at vhost-user-blk?  It does things slightly differently:
more of the virtio-blk device model is handled by the vhost-user device
(e.g. config space).  That might be necessary to implement
virtio_blk_config.writeback properly.

> +#define VHOST_BLK_SET_BACKEND _IOW(VHOST_VIRTIO, 0x50, int)

Needs to be in the uapi header so userspace can use it.

> +
> +enum {
> +	VHOST_BLK_VQ_MAX = 16,
> +	VHOST_BLK_VQ_MAX_REQS = 128,
> +};

These limits seem arbitrary and probably too low.

> +
> +struct vhost_blk_req {
> +	struct llist_node list;
> +	int index;
> +	struct vhost_blk_queue *q;
> +	struct virtio_blk_outhdr hdr;
> +	struct iovec *out_iov;
> +	struct iovec *in_iov;
> +	u8 out_num;
> +	u8 in_num;
> +	long len;
> +	struct kiocb iocb;
> +	struct iov_iter i;
> +	int res;
> +	void __user *status;
> +};
> +
> +struct vhost_blk_queue {
> +	int index;
> +	struct vhost_blk *blk;
> +	struct vhost_virtqueue vq;
> +	struct vhost_work w;
> +	struct llist_head wl;

What is this?  Please use clear names and comments. :)

> +static int vhost_blk_req_handle(struct vhost_blk_req *req)
> +{
> +	struct vhost_blk *blk = req->q->blk;
> +	struct vhost_virtqueue *vq = &req->q->vq;
> +	int type = le32_to_cpu(req->hdr.type);
> +	int ret;
> +	u8 status;
> +
> +	if ((type == VIRTIO_BLK_T_IN) || (type == VIRTIO_BLK_T_OUT)) {
> +		bool write = (type == VIRTIO_BLK_T_OUT);
> +		int nr_seg = (write ? req->out_num : req->in_num) - 1;
> +		unsigned long sector = le64_to_cpu(req->hdr.sector);

Using little-endian instead of the virtio types means that only VIRTIO
1.0 modern devices are supported (older devices may not be
little-endian!).  In that case you'd need to put the VIRTIO_1 feature
bit into the features mask.

> +		req = &q->req[head];
> +		req->index = head;
> +		req->out_num = out;
> +		req->in_num = in;
> +		req->out_iov = &vq->iov[1];
> +		req->in_iov = &vq->iov[out];
> +		req->status = vq->iov[out + in - 1].iov_base;

The VIRTIO spec explicitly requires that devices support arbitrary
descriptor layouts, so you cannot assume a particular iov[] layout.

> +static long vhost_blk_ioctl(struct file *f, unsigned int ioctl,
> +			    unsigned long arg)
> +{
> +	struct vhost_blk *blk = f->private_data;
> +	void __user *argp = (void __user *)arg;
> +	int fd;
> +	u64 __user *featurep = argp;
> +	u64 features;
> +	long ret;
> +	struct vhost_vring_state s;
> +
> +	switch (ioctl) {
> +	case VHOST_SET_MEM_TABLE:
> +		vhost_blk_stop(blk);
> +		ret = vhost_blk_pass_ioctl(blk, ioctl, argp);
> +		break;

Why is this necessary?  Existing vhost devices pass the ioctl through
without an explicit case for it.

> +	case VHOST_SET_VRING_NUM:
> +		if (copy_from_user(&s, argp, sizeof(s)))
> +			return -EFAULT;
> +		ret = vhost_blk_pass_ioctl(blk, ioctl, argp);
> +		if (!ret)
> +			blk->num_queues = s.index + 1;

Where is this input checked against ARRAY_SIZE(blk->queue)?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/1] vhost: add vhost_blk driver
From: Stefan Hajnoczi @ 2018-11-06 15:40 UTC (permalink / raw)
  To: Vitaly Mayatskikh
  Cc: den, kvm, Michael S. Tsirkin, linux-kernel, virtualization,
	Paolo Bonzini
In-Reply-To: <20181102142446-mutt-send-email-mst@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 3734 bytes --]

On Fri, Nov 02, 2018 at 02:26:00PM -0400, Michael S. Tsirkin wrote:
> On Fri, Nov 02, 2018 at 06:21:22PM +0000, Vitaly Mayatskikh wrote:
> > vhost_blk is a host-side kernel mode accelerator for virtio-blk. The
> > driver allows VM to reach a near bare-metal disk performance. See IOPS
> > numbers below (fio --rw=randread --bs=4k).
> > 
> > This implementation uses kiocb interface. It is slightly slower than
> > going directly through bio, but is simpler and also works with disk
> > images placed on a file system.
> > 
> > # fio num-jobs
> > # A: bare metal over block
> > # B: bare metal over file
> > # C: virtio-blk over block
> > # D: virtio-blk over file
> > # E: vhost-blk bio over block
> > # F: vhost-blk kiocb over block
> > # G: vhost-blk kiocb over file
> > #
> > #  A     B     C    D    E     F     G
> > 
> > 1  171k  151k  148k 151k 195k  187k  175k
> > 2  328k  302k  249k 241k 349k  334k  296k
> > 3  479k  437k  179k 174k 501k  464k  404k
> > 4  622k  568k  143k 183k 620k  580k  492k
> > 5  755k  697k  136k 128k 737k  693k  579k
> > 6  887k  808k  131k 120k 830k  782k  640k
> > 7  1004k 926k  126k 131k 926k  863k  693k
> > 8  1099k 1015k 117k 115k 1001k 931k  712k
> > 9  1194k 1119k 115k 111k 1055k 991k  711k
> > 10 1278k 1207k 109k 114k 1130k 1046k 695k
> > 11 1345k 1280k 110k 108k 1119k 1091k 663k
> > 12 1411k 1356k 104k 106k 1201k 1142k 629k
> > 13 1466k 1423k 106k 106k 1260k 1170k 607k
> > 14 1517k 1486k 103k 106k 1296k 1179k 589k
> > 15 1552k 1543k 102k 102k 1322k 1191k 571k
> > 16 1480k 1506k 101k 102k 1346k 1202k 566k
> > 
> > Vitaly Mayatskikh (1):
> >   Add vhost_blk driver
> 
> 
> Thanks!
> Before merging this, I'd like to get some acks from userspace that it's
> actually going to be used - e.g. QEMU block maintainers.

I have CCed Kevin, who is the overall QEMU block layer maintainer.

Also CCing Denis since I think someone was working on a QEMU userspace
multiqueue virtio-blk device for maximum performance.

Previously vhost_blk.ko implementations were basically the same thing as
the QEMU x-data-plane=on (dedicated thread using Linux AIO), except they
were using a kernel thread and maybe submitted bios.

The performance differences weren't convincing enough that it seemed
worthwhile maintaining another code path which loses live migration, I/O
throttling, image file formats, etc (all the things that QEMU's block
layer supports).

Two changes since then:

1. x-data-plane=on has been replaced with a full trip down QEMU's block
layer (-object iothread,id=iothread0 -device
virtio-blk-pci,iothread=iothread0,...).  It's slower and not truly
multiqueue (yet!).

So from this perspective vhost_blk.ko might be more attractive again, at
least until further QEMU block layer work eliminates the multiqueue and
performance overheads.

2. SPDK has become available for users who want the best I/O performance
and are willing to sacrifice CPU cores for polling.

If you want better performance and don't care about QEMU block layer
features, could you use SPDK?  People who are the target market for
vhost_blk.ko would probably be willing to use SPDK and it already
exists...

From the QEMU userspace perspective, I think the best way to integrate
vhost_blk.ko is to transparently switch to it when possible.  If the
user enables QEMU block layer features that are incompatible with
vhost_blk.ko, then it should fall back to the QEMU block layer
transparently.

I'm not keen on yet another code path with it's own set of limitations
and having to educate users about how to make the choice.  But if it can
be integrated transparently as an "accelerator", then it could be
valuable.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/1] vhost: add vhost_blk driver
From: Stefan Hajnoczi @ 2018-11-06 15:21 UTC (permalink / raw)
  To: Vitaly Mayatskih
  Cc: kvm, Michael S . Tsirkin, linux-kernel, virtualization,
	Paolo Bonzini
In-Reply-To: <CAGF4SLjC5Gs33k1XzK8vFF5FWE0pwHvB56iQ--DJ_4ADw5yuGg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 498 bytes --]

On Mon, Nov 05, 2018 at 11:15:04AM -0500, Vitaly Mayatskih wrote:
> On Mon, Nov 5, 2018 at 10:48 AM Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
> 
> > For the record, we still do use virtio-blk a lot. As we see new things like discard/write zero
> > support it seems that others do as well.
> 
> Yes, trim/discard and writesame support is planned, at least for
> submit_bio path (not in this patch).

I agree, virtio_blk is quite popular and not deprecated/frozen.

Stefan

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/1] vhost: add vhost_blk driver
From: Christoph Hellwig @ 2018-11-06  7:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: Vitaly Mayatskih, kvm, Michael S . Tsirkin, linux-kernel,
	virtualization, stefanha, Paolo Bonzini
In-Reply-To: <57eefa62-7e66-786d-441c-5dd6b0d451a5@redhat.com>

On Tue, Nov 06, 2018 at 10:45:08AM +0800, Jason Wang wrote:
> > Storage industry is shifting away from SCSI, which has a scaling
> > problem.
> 
> 
> Know little about storage. For scaling, do you mean SCSI protocol itself? If
> not, it's probably not a real issue for virtio-scsi itself.

The above is utter bullshit.  There is a big NVMe hype, but it is not
because "SCSI has a scaling problem", but because the industry can sell
a new thing, and the standardization body seems easier to work with.

^ permalink raw reply

* Re: [PATCH 3/5] VSOCK: support receive mergeable rx buffer in guest
From: jiangyiwen @ 2018-11-06  6:41 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <791d67e7-ad95-38e4-0d38-2b7c54d68040@redhat.com>

On 2018/11/6 12:00, Jason Wang wrote:
> 
> On 2018/11/5 下午3:47, jiangyiwen wrote:
>> Guest receive mergeable rx buffer, it can merge
>> scatter rx buffer into a big buffer and then copy
>> to user space.
>>
>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> ---
>>   include/linux/virtio_vsock.h            |  9 ++++
>>   net/vmw_vsock/virtio_transport.c        | 75 +++++++++++++++++++++++++++++----
>>   net/vmw_vsock/virtio_transport_common.c | 59 ++++++++++++++++++++++----
>>   3 files changed, 127 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index da9e1fe..6be3cd7 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -13,6 +13,8 @@
>>   #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE    (1024 * 4)
>>   #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>> +/* virtio_vsock_pkt + max_pkt_len(default MAX_PKT_BUF_SIZE) */
>> +#define VIRTIO_VSOCK_MAX_MRG_BUF_NUM ((VIRTIO_VSOCK_MAX_PKT_BUF_SIZE / PAGE_SIZE) + 1)
>>
>>   /* Virtio-vsock feature */
>>   #define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
>> @@ -48,6 +50,11 @@ struct virtio_vsock_sock {
>>       struct list_head rx_queue;
>>   };
>>
>> +struct virtio_vsock_mrg_rxbuf {
>> +    void *buf;
>> +    u32 len;
>> +};
>> +
>>   struct virtio_vsock_pkt {
>>       struct virtio_vsock_hdr    hdr;
>>       struct virtio_vsock_mrg_rxbuf_hdr mrg_rxbuf_hdr;
>> @@ -59,6 +66,8 @@ struct virtio_vsock_pkt {
>>       u32 len;
>>       u32 off;
>>       bool reply;
>> +    bool mergeable;
>> +    struct virtio_vsock_mrg_rxbuf mrg_rxbuf[VIRTIO_VSOCK_MAX_MRG_BUF_NUM];
>>   };
> 
> 
> It's better to use iov here I think, and drop buf completely.
> 
> And this is better to be done in an independent patch.
> 

You're right, I can use kvec instead of customized structure,
in addition, I don't understand about drop buf completely and
an independent patch.

Thanks.

> 
>>
>>   struct virtio_vsock_pkt_info {
>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> index 2040a9e..3557ad3 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -359,11 +359,62 @@ static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
>>       return val < virtqueue_get_vring_size(vq);
>>   }
>>
>> +static struct virtio_vsock_pkt *receive_mergeable(struct virtqueue *vq,
>> +        struct virtio_vsock *vsock, unsigned int *total_len)
>> +{
>> +    struct virtio_vsock_pkt *pkt;
>> +    u16 num_buf;
>> +    void *page;
>> +    unsigned int len;
>> +    int i = 0;
>> +
>> +    page = virtqueue_get_buf(vq, &len);
>> +    if (!page)
>> +        return NULL;
>> +
>> +    *total_len = len;
>> +    vsock->rx_buf_nr--;
>> +
>> +    pkt = page;
>> +    num_buf = le16_to_cpu(pkt->mrg_rxbuf_hdr.num_buffers);
>> +    if (!num_buf || num_buf > VIRTIO_VSOCK_MAX_MRG_BUF_NUM)
>> +        goto err;
>> +
>> +    pkt->mergeable = true;
>> +    if (!le32_to_cpu(pkt->hdr.len))
>> +        return pkt;
>> +
>> +    len -= sizeof(struct virtio_vsock_pkt);
>> +    pkt->mrg_rxbuf[i].buf = page + sizeof(struct virtio_vsock_pkt);
>> +    pkt->mrg_rxbuf[i].len = len;
>> +    i++;
>> +
>> +    while (--num_buf) {
>> +        page = virtqueue_get_buf(vq, &len);
>> +        if (!page)
>> +            goto err;
>> +
>> +        *total_len += len;
>> +        vsock->rx_buf_nr--;
>> +
>> +        pkt->mrg_rxbuf[i].buf = page;
>> +        pkt->mrg_rxbuf[i].len = len;
>> +        i++;
>> +    }
>> +
>> +    return pkt;
>> +err:
>> +    virtio_transport_free_pkt(pkt);
>> +    return NULL;
>> +}
> 
> 
> Similar logic could be found at virtio-net driver.
> 
> Haven't thought this deeply, but it looks to me use virtio-net driver is also possible, e.g for data-path, just register vsock specific callbacks.
> 
> 
>> +
>>   static void virtio_transport_rx_work(struct work_struct *work)
>>   {
>>       struct virtio_vsock *vsock =
>>           container_of(work, struct virtio_vsock, rx_work);
>>       struct virtqueue *vq;
>> +    size_t vsock_hlen = vsock->mergeable ? sizeof(struct virtio_vsock_pkt) :
>> +            sizeof(struct virtio_vsock_hdr);
>>
>>       vq = vsock->vqs[VSOCK_VQ_RX];
>>
>> @@ -383,21 +434,29 @@ static void virtio_transport_rx_work(struct work_struct *work)
>>                   goto out;
>>               }
>>
>> -            pkt = virtqueue_get_buf(vq, &len);
>> -            if (!pkt) {
>> -                break;
>> -            }
>> +            if (likely(vsock->mergeable)) {
>> +                pkt = receive_mergeable(vq, vsock, &len);
>> +                if (!pkt)
>> +                    break;
>>
>> -            vsock->rx_buf_nr--;
>> +                pkt->len = le32_to_cpu(pkt->hdr.len);
>> +            } else {
>> +                pkt = virtqueue_get_buf(vq, &len);
>> +                if (!pkt) {
>> +                    break;
>> +                }
>> +
>> +                vsock->rx_buf_nr--;
>> +            }
>>
>>               /* Drop short/long packets */
>> -            if (unlikely(len < sizeof(pkt->hdr) ||
>> -                     len > sizeof(pkt->hdr) + pkt->len)) {
>> +            if (unlikely(len < vsock_hlen ||
>> +                     len > vsock_hlen + pkt->len)) {
>>                   virtio_transport_free_pkt(pkt);
>>                   continue;
>>               }
>>
>> -            pkt->len = len - sizeof(pkt->hdr);
>> +            pkt->len = len - vsock_hlen;
>>               virtio_transport_deliver_tap_pkt(pkt);
>>               virtio_transport_recv_pkt(pkt);
>>           }
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 3ae3a33..7bef1d5 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -272,14 +272,49 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk,
>>            */
>>           spin_unlock_bh(&vvs->rx_lock);
>>
>> -        err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
>> -        if (err)
>> -            goto out;
>> +        if (pkt->mergeable) {
>> +            struct virtio_vsock_mrg_rxbuf *buf = pkt->mrg_rxbuf;
>> +            size_t mrg_copy_bytes, last_buf_total = 0, rxbuf_off;
>> +            size_t tmp_bytes = bytes;
>> +            int i;
>> +
>> +            for (i = 0; i < le16_to_cpu(pkt->mrg_rxbuf_hdr.num_buffers); i++) {
>> +                if (pkt->off > last_buf_total + buf[i].len) {
>> +                    last_buf_total += buf[i].len;
>> +                    continue;
>> +                }
>> +
>> +                rxbuf_off = pkt->off - last_buf_total;
>> +                mrg_copy_bytes = min(buf[i].len - rxbuf_off, tmp_bytes);
>> +                err = memcpy_to_msg(msg, buf[i].buf + rxbuf_off, mrg_copy_bytes);
>> +                if (err)
>> +                    goto out;
>> +
>> +                tmp_bytes -= mrg_copy_bytes;
>> +                pkt->off += mrg_copy_bytes;
>> +                last_buf_total += buf[i].len;
>> +                if (!tmp_bytes)
>> +                    break;
>> +            }
> 
> 
> After switch to use iov, you can user iov_iter helper to avoid the above open-coding I believe.
> 
> And you can also drop the if (mergeable) condition.
> 
> Thanks
> 

Thanks, I will try to use iov instead.

> 
>> +
>> +            if (tmp_bytes) {
>> +                printk(KERN_WARNING "WARNING! bytes = %llu, "
>> +                        "bytes = %llu\n",
>> +                        (unsigned long long)bytes,
>> +                        (unsigned long long)tmp_bytes);
>> +            }
>> +
>> +            total += (bytes - tmp_bytes);
>> +        } else {
>> +            err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
>> +            if (err)
>> +                goto out;
>> +
>> +            total += bytes;
>> +            pkt->off += bytes;
>> +        }
>>
>>           spin_lock_bh(&vvs->rx_lock);
>> -
>> -        total += bytes;
>> -        pkt->off += bytes;
>>           if (pkt->off == pkt->len) {
>>               virtio_transport_dec_rx_pkt(vvs, pkt);
>>               list_del(&pkt->list);
>> @@ -1050,8 +1085,16 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
>>
>>   void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
>>   {
>> -    kfree(pkt->buf);
>> -    kfree(pkt);
>> +    int i;
>> +
>> +    if (pkt->mergeable) {
>> +        for (i = 1; i < le16_to_cpu(pkt->mrg_rxbuf_hdr.num_buffers); i++)
>> +            free_page((unsigned long)pkt->mrg_rxbuf[i].buf);
>> +        free_page((unsigned long)(void *)pkt);
>> +    } else {
>> +        kfree(pkt->buf);
>> +        kfree(pkt);
>> +    }
>>   }
>>   EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
>>
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/5] VSOCK: support fill data to mergeable rx buffer in host
From: jiangyiwen @ 2018-11-06  6:30 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <485c2c5d-d73e-e679-9549-aad3de02f0ab@redhat.com>

On 2018/11/6 11:43, Jason Wang wrote:
> 
> On 2018/11/5 下午3:45, jiangyiwen wrote:
>> When vhost support VIRTIO_VSOCK_F_MRG_RXBUF feature,
>> it will merge big packet into rx vq.
>>
>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> ---
>>   drivers/vhost/vsock.c             | 117 +++++++++++++++++++++++++++++++-------
>>   include/linux/virtio_vsock.h      |   1 +
>>   include/uapi/linux/virtio_vsock.h |   5 ++
>>   3 files changed, 102 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index 34bc3ab..648be39 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -22,7 +22,8 @@
>>   #define VHOST_VSOCK_DEFAULT_HOST_CID    2
>>
>>   enum {
>> -    VHOST_VSOCK_FEATURES = VHOST_FEATURES,
>> +    VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>> +            (1ULL << VIRTIO_VSOCK_F_MRG_RXBUF),
>>   };
>>
>>   /* Used to track all the vhost_vsock instances on the system. */
>> @@ -80,6 +81,68 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>       return vsock;
>>   }
>>
>> +static int get_rx_bufs(struct vhost_virtqueue *vq,
>> +        struct vring_used_elem *heads, int datalen,
>> +        unsigned *iovcount, unsigned int quota)
>> +{
>> +    unsigned int out, in;
>> +    int seg = 0;
>> +    int headcount = 0;
>> +    unsigned d;
>> +    int ret;
>> +    /*
>> +     * len is always initialized before use since we are always called with
>> +     * datalen > 0.
>> +     */
>> +    u32 uninitialized_var(len);
>> +
>> +    while (datalen > 0 && headcount < quota) {
>> +        if (unlikely(seg >= UIO_MAXIOV)) {
>> +            ret = -ENOBUFS;
>> +            goto err;
>> +        }
>> +
>> +        ret = vhost_get_vq_desc(vq, vq->iov + seg,
>> +                ARRAY_SIZE(vq->iov) - seg, &out,
>> +                &in, NULL, NULL);
>> +        if (unlikely(ret < 0))
>> +            goto err;
>> +
>> +        d = ret;
>> +        if (d == vq->num) {
>> +            ret = 0;
>> +            goto err;
>> +        }
>> +
>> +        if (unlikely(out || in <= 0)) {
>> +            vq_err(vq, "unexpected descriptor format for RX: "
>> +                    "out %d, in %d\n", out, in);
>> +            ret = -EINVAL;
>> +            goto err;
>> +        }
>> +
>> +        heads[headcount].id = cpu_to_vhost32(vq, d);
>> +        len = iov_length(vq->iov + seg, in);
>> +        heads[headcount].len = cpu_to_vhost32(vq, len);
>> +        datalen -= len;
>> +        ++headcount;
>> +        seg += in;
>> +    }
>> +
>> +    heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
>> +    *iovcount = seg;
>> +
>> +    /* Detect overrun */
>> +    if (unlikely(datalen > 0)) {
>> +        ret = UIO_MAXIOV + 1;
>> +        goto err;
>> +    }
>> +    return headcount;
>> +err:
>> +    vhost_discard_vq_desc(vq, headcount);
>> +    return ret;
>> +}
> 
> 
> Seems duplicated with the one used by vhost-net.
> 
> In packed virtqueue implementation, I plan to move this to vhost.c.
> 

Yes, this code is full copied from vhost-net, if it can be packed into
vhost.c, it would be great.

> 
>> +
>>   static void
>>   vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>>                   struct vhost_virtqueue *vq)
>> @@ -87,22 +150,34 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>       struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
>>       bool added = false;
>>       bool restart_tx = false;
>> +    int mergeable;
>> +    size_t vsock_hlen;
>>
>>       mutex_lock(&vq->mutex);
>>
>>       if (!vq->private_data)
>>           goto out;
>>
>> +    mergeable = vhost_has_feature(vq, VIRTIO_VSOCK_F_MRG_RXBUF);
>> +    /*
>> +     * Guest fill page for rx vq in mergeable case, so it will not
>> +     * allocate pkt structure, we should reserve size of pkt in advance.
>> +     */
>> +    if (likely(mergeable))
>> +        vsock_hlen = sizeof(struct virtio_vsock_pkt);
>> +    else
>> +        vsock_hlen = sizeof(struct virtio_vsock_hdr);
>> +
>>       /* Avoid further vmexits, we're already processing the virtqueue */
>>       vhost_disable_notify(&vsock->dev, vq);
>>
>>       for (;;) {
>>           struct virtio_vsock_pkt *pkt;
>>           struct iov_iter iov_iter;
>> -        unsigned out, in;
>> +        unsigned out = 0, in = 0;
>>           size_t nbytes;
>>           size_t len;
>> -        int head;
>> +        s16 headcount;
>>
>>           spin_lock_bh(&vsock->send_pkt_list_lock);
>>           if (list_empty(&vsock->send_pkt_list)) {
>> @@ -116,16 +191,9 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>           list_del_init(&pkt->list);
>>           spin_unlock_bh(&vsock->send_pkt_list_lock);
>>
>> -        head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>> -                     &out, &in, NULL, NULL);
>> -        if (head < 0) {
>> -            spin_lock_bh(&vsock->send_pkt_list_lock);
>> -            list_add(&pkt->list, &vsock->send_pkt_list);
>> -            spin_unlock_bh(&vsock->send_pkt_list_lock);
>> -            break;
>> -        }
>> -
>> -        if (head == vq->num) {
>> +        headcount = get_rx_bufs(vq, vq->heads, vsock_hlen + pkt->len,
>> +                &in, likely(mergeable) ? UIO_MAXIOV : 1);
>> +        if (headcount <= 0) {
>>               spin_lock_bh(&vsock->send_pkt_list_lock);
>>               list_add(&pkt->list, &vsock->send_pkt_list);
>>               spin_unlock_bh(&vsock->send_pkt_list_lock);
>> @@ -133,19 +201,13 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>               /* We cannot finish yet if more buffers snuck in while
>>                * re-enabling notify.
>>                */
>> -            if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
>> +            if (!headcount && unlikely(vhost_enable_notify(&vsock->dev, vq))) {
>>                   vhost_disable_notify(&vsock->dev, vq);
>>                   continue;
>>               }
>>               break;
>>           }
>>
>> -        if (out) {
>> -            virtio_transport_free_pkt(pkt);
>> -            vq_err(vq, "Expected 0 output buffers, got %u\n", out);
>> -            break;
>> -        }
>> -
>>           len = iov_length(&vq->iov[out], in);
>>           iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
>>
>> @@ -156,6 +218,19 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>               break;
>>           }
>>
>> +        if (likely(mergeable)) {
>> +            pkt->mrg_rxbuf_hdr.num_buffers = cpu_to_le16(headcount);
>> +            nbytes = copy_to_iter(&pkt->mrg_rxbuf_hdr,
>> +                    sizeof(pkt->mrg_rxbuf_hdr), &iov_iter);
>> +            if (nbytes != sizeof(pkt->mrg_rxbuf_hdr)) {
>> +                virtio_transport_free_pkt(pkt);
>> +                vq_err(vq, "Faulted on copying rxbuf hdr\n");
>> +                break;
>> +            }
>> +            iov_iter_advance(&iov_iter, (vsock_hlen -
>> +                    sizeof(pkt->mrg_rxbuf_hdr) - sizeof(pkt->hdr)));
>> +        }
>> +
>>           nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
>>           if (nbytes != pkt->len) {
>>               virtio_transport_free_pkt(pkt);
>> @@ -163,7 +238,7 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>>               break;
>>           }
>>
>> -        vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
>> +        vhost_add_used_n(vq, vq->heads, headcount);
>>           added = true;
> 
> 
> Looks rather similar to vhost-net mergeable rx buffer implementation. Another proof of using vhost-net.
> 
> Thanks

Yes.

> 
> 
>>
>>           if (pkt->reply) {
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index bf84418..da9e1fe 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -50,6 +50,7 @@ struct virtio_vsock_sock {
>>
>>   struct virtio_vsock_pkt {
>>       struct virtio_vsock_hdr    hdr;
>> +    struct virtio_vsock_mrg_rxbuf_hdr mrg_rxbuf_hdr;
>>       struct work_struct work;
>>       struct list_head list;
>>       /* socket refcnt not held, only use for cancellation */
>> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>> index 1d57ed3..2292f30 100644
>> --- a/include/uapi/linux/virtio_vsock.h
>> +++ b/include/uapi/linux/virtio_vsock.h
>> @@ -63,6 +63,11 @@ struct virtio_vsock_hdr {
>>       __le32    fwd_cnt;
>>   } __attribute__((packed));
>>
>> +/* It add mergeable rx buffers feature */
>> +struct virtio_vsock_mrg_rxbuf_hdr {
>> +    __le16  num_buffers;    /* number of mergeable rx buffers */
>> +} __attribute__((packed));
>> +
>>   enum virtio_vsock_type {
>>       VIRTIO_VSOCK_TYPE_STREAM = 1,
>>   };
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/5] VSOCK: support fill mergeable rx buffer in guest
From: jiangyiwen @ 2018-11-06  6:22 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <1801c013-6cfb-1d07-3401-49f536e01983@redhat.com>

On 2018/11/6 11:38, Jason Wang wrote:
> 
> On 2018/11/5 下午3:45, jiangyiwen wrote:
>> In driver probing, if virtio has VIRTIO_VSOCK_F_MRG_RXBUF feature,
>> it will fill mergeable rx buffer, support for host send mergeable
>> rx buffer. It will fill a page everytime to compact with small
>> packet and big packet.
>>
>> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
>> ---
>>   include/linux/virtio_vsock.h     |  3 ++
>>   net/vmw_vsock/virtio_transport.c | 72 +++++++++++++++++++++++++++++-----------
>>   2 files changed, 56 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index e223e26..bf84418 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -14,6 +14,9 @@
>>   #define VIRTIO_VSOCK_MAX_BUF_SIZE        0xFFFFFFFFUL
>>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE        (1024 * 64)
>>
>> +/* Virtio-vsock feature */
>> +#define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
>> +
>>   enum {
>>       VSOCK_VQ_RX     = 0, /* for host to guest data */
>>       VSOCK_VQ_TX     = 1, /* for guest to host data */
>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> index 5d3cce9..2040a9e 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -64,6 +64,7 @@ struct virtio_vsock {
>>       struct virtio_vsock_event event_list[8];
>>
>>       u32 guest_cid;
>> +    bool mergeable;
>>   };
>>
>>   static struct virtio_vsock *virtio_vsock_get(void)
>> @@ -256,6 +257,25 @@ static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
>>       return 0;
>>   }
>>
>> +static int fill_mergeable_rx_buff(struct virtqueue *vq)
>> +{
>> +    void *page = NULL;
>> +    struct scatterlist sg;
>> +    int err;
>> +
>> +    page = (void *)get_zeroed_page(GFP_KERNEL);
> 
> 
> Any reason to use zeroed page?

In previous version, the entire structure of virtio_vsock_pkt is preallocated
in guest use kzalloc, it is a contiguous zeroed physical memory, but host only
need to fill virtio_vsock_hdr size.

However, in mergeable rx buffer version, we only fill a page in vring descriptor
in guest, and I will reserve size of virtio_vsock_pkt in host instead of write
the total size of virtio_vsock_pkt, for the correctness of structure value,
we should set zeroed page in advance.

> 
> 
>> +    if (!page)
>> +        return -ENOMEM;
>> +
>> +    sg_init_one(&sg, page, PAGE_SIZE);
> 
> 
> FYI, for virtio-net we have several optimizations for mergeable rx buffer:
> 
> - skb_page_frag_refill() which can use high order page and reduce the stress of page allocator
> 

You're right, initially I want to use a memory poll to manage the rx buffer,
and then use this in the later optimized patch. Your advice is very great.

> - we don't use fixed buffer size, instead we use EWMA to estimate the possible rx buffer size to avoid internal fragmentation
> 

Ok, I analysis the feature and consider add it into my patches.

> 
> If we can try to reuse virtio-net driver, we will get those nice features.
> 

Yes, after all virtio-net has a very good ecological environment, and it also
do many performance optimization, it is actually a good idea.

> 
> Thanks
> 
> 
>> +
>> +    err = virtqueue_add_inbuf(vq, &sg, 1, page, GFP_KERNEL);
>> +    if (err < 0)
>> +        free_page((unsigned long) page);
>> +
>> +    return err;
>> +}
>> +
>>   static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>   {
>>       int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
>> @@ -267,27 +287,33 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>>       vq = vsock->vqs[VSOCK_VQ_RX];
>>
>>       do {
>> -        pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
>> -        if (!pkt)
>> -            break;
>> +        if (vsock->mergeable) {
>> +            ret = fill_mergeable_rx_buff(vq);
>> +            if (ret)
>> +                break;
>> +        } else {
>> +            pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
>> +            if (!pkt)
>> +                break;
>>
>> -        pkt->buf = kmalloc(buf_len, GFP_KERNEL);
>> -        if (!pkt->buf) {
>> -            virtio_transport_free_pkt(pkt);
>> -            break;
>> -        }
>> +            pkt->buf = kmalloc(buf_len, GFP_KERNEL);
>> +            if (!pkt->buf) {
>> +                virtio_transport_free_pkt(pkt);
>> +                break;
>> +            }
>>
>> -        pkt->len = buf_len;
>> +            pkt->len = buf_len;
>>
>> -        sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>> -        sgs[0] = &hdr;
>> +            sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
>> +            sgs[0] = &hdr;
>>
>> -        sg_init_one(&buf, pkt->buf, buf_len);
>> -        sgs[1] = &buf;
>> -        ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
>> -        if (ret) {
>> -            virtio_transport_free_pkt(pkt);
>> -            break;
>> +            sg_init_one(&buf, pkt->buf, buf_len);
>> +            sgs[1] = &buf;
>> +            ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
>> +            if (ret) {
>> +                virtio_transport_free_pkt(pkt);
>> +                break;
>> +            }
>>           }
>>           vsock->rx_buf_nr++;
>>       } while (vq->num_free);
>> @@ -588,6 +614,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>>       if (ret < 0)
>>           goto out_vqs;
>>
>> +    if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_MRG_RXBUF))
>> +        vsock->mergeable = true;
>> +
>>       vsock->rx_buf_nr = 0;
>>       vsock->rx_buf_max_nr = 0;
>>       atomic_set(&vsock->queued_replies, 0);
>> @@ -640,8 +669,12 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
>>       vdev->config->reset(vdev);
>>
>>       mutex_lock(&vsock->rx_lock);
>> -    while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX])))
>> -        virtio_transport_free_pkt(pkt);
>> +    while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) {
>> +        if (vsock->mergeable)
>> +            free_page((unsigned long)(void *)pkt);
>> +        else
>> +            virtio_transport_free_pkt(pkt);
>> +    }
>>       mutex_unlock(&vsock->rx_lock);
>>
>>       mutex_lock(&vsock->tx_lock);
>> @@ -683,6 +716,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
>>   };
>>
>>   static unsigned int features[] = {
>> +    VIRTIO_VSOCK_F_MRG_RXBUF,
>>   };
>>
>>   static struct virtio_driver virtio_vsock_driver = {
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
From: jiangyiwen @ 2018-11-06  5:53 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <229559d5-1787-09b1-6c26-57b535f20006@redhat.com>

On 2018/11/6 11:32, Jason Wang wrote:
> 
> On 2018/11/6 上午11:17, jiangyiwen wrote:
>> On 2018/11/6 10:41, Jason Wang wrote:
>>> On 2018/11/6 上午10:17, jiangyiwen wrote:
>>>> On 2018/11/5 17:21, Jason Wang wrote:
>>>>> On 2018/11/5 下午3:43, jiangyiwen wrote:
>>>>>> Now vsock only support send/receive small packet, it can't achieve
>>>>>> high performance. As previous discussed with Jason Wang, I revisit the
>>>>>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>>>>>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>>>>>> into different buffers and improve performance obviously.
>>>>>>
>>>>>> I write a tool to test the vhost-vsock performance, mainly send big
>>>>>> packet(64K) included guest->Host and Host->Guest. The result as
>>>>>> follows:
>>>>>>
>>>>>> Before performance:
>>>>>>                  Single socket            Multiple sockets(Max Bandwidth)
>>>>>> Guest->Host   ~400MB/s                 ~480MB/s
>>>>>> Host->Guest   ~1450MB/s                ~1600MB/s
>>>>>>
>>>>>> After performance:
>>>>>>                  Single socket            Multiple sockets(Max Bandwidth)
>>>>>> Guest->Host   ~1700MB/s                ~2900MB/s
>>>>>> Host->Guest   ~1700MB/s                ~2900MB/s
>>>>>>
>>>>>>    From the test results, the performance is improved obviously, and guest
>>>>>> memory will not be wasted.
>>>>> Hi:
>>>>>
>>>>> Thanks for the patches and the numbers are really impressive.
>>>>>
>>>>> But instead of duplicating codes between sock and net. I was considering to use virtio-net as a transport of vsock. Then we may have all existed features likes batching, mergeable rx buffers and multiqueue. Want to consider this idea? Thoughts?
>>>>>
>>>>>
>>>> Hi Jason,
>>>>
>>>> I am not very familiar with virtio-net, so I am afraid I can't give too
>>>> much effective advice. Then I have several problems:
>>>>
>>>> 1. If use virtio-net as a transport, guest should see a virtio-net
>>>> device instead of virtio-vsock device, right? Is vsock only as a
>>>> transport between socket and net_device? User should still use
>>>> AF_VSOCK type to create socket, right?
>>>
>>> Well, there're many choices. What you need is just to keep the socket API and hide the implementation. For example, you can keep the vosck device in guest and switch to use vhost-net in host. We probably need a new feature bit or header to let vhost know we are passing vsock packet. And vhost-net could forward the packet to vsock core on host.
>>>
>>>
>>>> 2. I want to know if this idea has already started, and how is
>>>> the current progress?
>>>
>>> Not yet started.  Just want to listen from the community. If this sounds good, do you have interest in implementing this?
>>>
>>>
>>>> 3. And what is stefan's idea?
>>>
>>> Talk with Stefan a little on this during KVM Forum. I think he tends to agree on this idea. Anyway, let's wait for his reply.
>>>
>>>
>>> Thanks
>>>
>>>
>> Hi Jason,
>>
>> Thanks your reply, what you want is try to avoid duplicate code, and still
>> use the existed features with virtio-net.
> 
> 
> Yes, technically we can use virtio-net driver is guest as well but we could do it step by step.
> 
> 
>> Yes, if this sounds good and most people can recognize this idea, I am very
>> happy to implement this.
> 
> 
> Cool, thanks.
> 
> 
>>
>> In addition, I hope you can review these patches before the new idea is
>> implemented, after all the performance can be improved. :-)
> 
> 
> Ok.
> 
> 
> So the patch actually did three things:
> 
> - mergeable buffer implementation
> 
> - increase the default rx buffer size
> 
> - add used and signal guest in a batch
> 
> It would be helpful if you can measure the performance improvement independently. This can give reviewer a better understanding on how much did each part help.
> 
> Thanks
> 
> 

Great, I will test the performance independently in the later version.

Thanks,
Yiwen.

>>
>> Thanks,
>> Yiwen.
>>
>>>> Thanks,
>>>> Yiwen.
>>>>
>>> .
>>>
>>
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v15 23/26] sched: early boot clock
From: Dominique Martinet @ 2018-11-06  5:42 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: gnomes, feng.tang, kvm, peterz, heiko.carstens, qemu-devel,
	virtualization, steven.sistare, hpa, boris.ostrovsky, prarit,
	linux-s390, x86, linux, daniel.m.jordan, mingo, pmladek,
	john.stultz, tglx, jgross, douly.fnst, sboyd, linux-kernel,
	schwidefsky, pbonzini
In-Reply-To: <20180719205545.16512-24-pasha.tatashin@oracle.com>

(added various kvm/virtualization lists in Cc as well as qemu as I don't
know who's "wrong" here)

Pavel Tatashin wrote on Thu, Jul 19, 2018:
> Allow sched_clock() to be used before schec_clock_init() is called.
> This provides with a way to get early boot timestamps on machines with
> unstable clocks.

This isn't something I understand, but bisect tells me this patch
(landed as 857baa87b64 ("sched/clock: Enable sched clock early")) makes
a VM running with kvmclock take a step in uptime/printk timer early in
boot sequence as illustrated below. The step seems to be related to the
amount of time the host was suspended while qemu was running before the
reboot.

$ dmesg
...
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[283120.529821] kvm-clock: cpu 0, msr 321a8001, primary cpu clock
[283120.529822] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[283120.529824] tsc: Detected 2592.000 MHz processor
...

(The VM is x86_64 on x86_64, I can provide my .config on request but
don't think it's related)


It's rather annoying for me as I often reboot VMs and rely on the
'uptime' command to check if I did just reboot or not as I have the
attention span of a goldfish; I'd rather not have to find something else
to check if I did just reboot or not.

Note that if the qemu process is restarted, there is no offset anymore.

I unfortunately just did that so cannot say with confidence (putting my
laptop to sleep for 30s only led to a 2s offset and I do not want to
wait longer right now), but it looks like the clock is still mostly
correct after reboot after disabling my VM's ntp client. Will infirm
that tomorrow if I was wrong.


Happy to try to help fixing this in any way, as written above the quote
I'm not even actually sure who is wrong here.

Thanks!



(As a side, mostly unrelated note, insert swearing here about cf7a63ef4
not compiling earlier in this serie; some variable declaration got
removed before their use. Was fixed in the next patch but I didn't
notice the kernel didn't fully rebuild and wasted time in my bisect
heading the wrong way...)

> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> ---
>  init/main.c          |  2 +-
>  kernel/sched/clock.c | 20 +++++++++++++++++++-
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/init/main.c b/init/main.c
> index 162d931c9511..ff0a24170b95 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -642,7 +642,6 @@ asmlinkage __visible void __init start_kernel(void)
>  	softirq_init();
>  	timekeeping_init();
>  	time_init();
> -	sched_clock_init();
>  	printk_safe_init();
>  	perf_event_init();
>  	profile_init();
> @@ -697,6 +696,7 @@ asmlinkage __visible void __init start_kernel(void)
>  	acpi_early_init();
>  	if (late_time_init)
>  		late_time_init();
> +	sched_clock_init();
>  	calibrate_delay();
>  	pid_idr_init();
>  	anon_vma_init();
> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 0e9dbb2d9aea..422cd63f8f17 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -202,7 +202,25 @@ static void __sched_clock_gtod_offset(void)
>  
>  void __init sched_clock_init(void)
>  {
> +	unsigned long flags;
> +
> +	/*
> +	 * Set __gtod_offset such that once we mark sched_clock_running,
> +	 * sched_clock_tick() continues where sched_clock() left off.
> +	 *
> +	 * Even if TSC is buggered, we're still UP at this point so it
> +	 * can't really be out of sync.
> +	 */
> +	local_irq_save(flags);
> +	__sched_clock_gtod_offset();
> +	local_irq_restore(flags);
> +
>  	sched_clock_running = 1;
> +
> +	/* Now that sched_clock_running is set adjust scd */
> +	local_irq_save(flags);
> +	sched_clock_tick();
> +	local_irq_restore(flags);
>  }
>  /*
>   * We run this as late_initcall() such that it runs after all built-in drivers,
> @@ -356,7 +374,7 @@ u64 sched_clock_cpu(int cpu)
>  		return sched_clock() + __sched_clock_offset;
>  
>  	if (unlikely(!sched_clock_running))
> -		return 0ull;
> +		return sched_clock();
>  
>  	preempt_disable_notrace();
>  	scd = cpu_sdc(cpu);
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply

* Re: [PATCH 3/5] VSOCK: support receive mergeable rx buffer in guest
From: Jason Wang @ 2018-11-06  4:00 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BDFF57C.5020106@huawei.com>


On 2018/11/5 下午3:47, jiangyiwen wrote:
> Guest receive mergeable rx buffer, it can merge
> scatter rx buffer into a big buffer and then copy
> to user space.
>
> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>   include/linux/virtio_vsock.h            |  9 ++++
>   net/vmw_vsock/virtio_transport.c        | 75 +++++++++++++++++++++++++++++----
>   net/vmw_vsock/virtio_transport_common.c | 59 ++++++++++++++++++++++----
>   3 files changed, 127 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index da9e1fe..6be3cd7 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -13,6 +13,8 @@
>   #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> +/* virtio_vsock_pkt + max_pkt_len(default MAX_PKT_BUF_SIZE) */
> +#define VIRTIO_VSOCK_MAX_MRG_BUF_NUM ((VIRTIO_VSOCK_MAX_PKT_BUF_SIZE / PAGE_SIZE) + 1)
>
>   /* Virtio-vsock feature */
>   #define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
> @@ -48,6 +50,11 @@ struct virtio_vsock_sock {
>   	struct list_head rx_queue;
>   };
>
> +struct virtio_vsock_mrg_rxbuf {
> +	void *buf;
> +	u32 len;
> +};
> +
>   struct virtio_vsock_pkt {
>   	struct virtio_vsock_hdr	hdr;
>   	struct virtio_vsock_mrg_rxbuf_hdr mrg_rxbuf_hdr;
> @@ -59,6 +66,8 @@ struct virtio_vsock_pkt {
>   	u32 len;
>   	u32 off;
>   	bool reply;
> +	bool mergeable;
> +	struct virtio_vsock_mrg_rxbuf mrg_rxbuf[VIRTIO_VSOCK_MAX_MRG_BUF_NUM];
>   };


It's better to use iov here I think, and drop buf completely.

And this is better to be done in an independent patch.


>
>   struct virtio_vsock_pkt_info {
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 2040a9e..3557ad3 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -359,11 +359,62 @@ static bool virtio_transport_more_replies(struct virtio_vsock *vsock)
>   	return val < virtqueue_get_vring_size(vq);
>   }
>
> +static struct virtio_vsock_pkt *receive_mergeable(struct virtqueue *vq,
> +		struct virtio_vsock *vsock, unsigned int *total_len)
> +{
> +	struct virtio_vsock_pkt *pkt;
> +	u16 num_buf;
> +	void *page;
> +	unsigned int len;
> +	int i = 0;
> +
> +	page = virtqueue_get_buf(vq, &len);
> +	if (!page)
> +		return NULL;
> +
> +	*total_len = len;
> +	vsock->rx_buf_nr--;
> +
> +	pkt = page;
> +	num_buf = le16_to_cpu(pkt->mrg_rxbuf_hdr.num_buffers);
> +	if (!num_buf || num_buf > VIRTIO_VSOCK_MAX_MRG_BUF_NUM)
> +		goto err;
> +
> +	pkt->mergeable = true;
> +	if (!le32_to_cpu(pkt->hdr.len))
> +		return pkt;
> +
> +	len -= sizeof(struct virtio_vsock_pkt);
> +	pkt->mrg_rxbuf[i].buf = page + sizeof(struct virtio_vsock_pkt);
> +	pkt->mrg_rxbuf[i].len = len;
> +	i++;
> +
> +	while (--num_buf) {
> +		page = virtqueue_get_buf(vq, &len);
> +		if (!page)
> +			goto err;
> +
> +		*total_len += len;
> +		vsock->rx_buf_nr--;
> +
> +		pkt->mrg_rxbuf[i].buf = page;
> +		pkt->mrg_rxbuf[i].len = len;
> +		i++;
> +	}
> +
> +	return pkt;
> +err:
> +	virtio_transport_free_pkt(pkt);
> +	return NULL;
> +}


Similar logic could be found at virtio-net driver.

Haven't thought this deeply, but it looks to me use virtio-net driver is 
also possible, e.g for data-path, just register vsock specific callbacks.


> +
>   static void virtio_transport_rx_work(struct work_struct *work)
>   {
>   	struct virtio_vsock *vsock =
>   		container_of(work, struct virtio_vsock, rx_work);
>   	struct virtqueue *vq;
> +	size_t vsock_hlen = vsock->mergeable ? sizeof(struct virtio_vsock_pkt) :
> +			sizeof(struct virtio_vsock_hdr);
>
>   	vq = vsock->vqs[VSOCK_VQ_RX];
>
> @@ -383,21 +434,29 @@ static void virtio_transport_rx_work(struct work_struct *work)
>   				goto out;
>   			}
>
> -			pkt = virtqueue_get_buf(vq, &len);
> -			if (!pkt) {
> -				break;
> -			}
> +			if (likely(vsock->mergeable)) {
> +				pkt = receive_mergeable(vq, vsock, &len);
> +				if (!pkt)
> +					break;
>
> -			vsock->rx_buf_nr--;
> +				pkt->len = le32_to_cpu(pkt->hdr.len);
> +			} else {
> +				pkt = virtqueue_get_buf(vq, &len);
> +				if (!pkt) {
> +					break;
> +				}
> +
> +				vsock->rx_buf_nr--;
> +			}
>
>   			/* Drop short/long packets */
> -			if (unlikely(len < sizeof(pkt->hdr) ||
> -				     len > sizeof(pkt->hdr) + pkt->len)) {
> +			if (unlikely(len < vsock_hlen ||
> +				     len > vsock_hlen + pkt->len)) {
>   				virtio_transport_free_pkt(pkt);
>   				continue;
>   			}
>
> -			pkt->len = len - sizeof(pkt->hdr);
> +			pkt->len = len - vsock_hlen;
>   			virtio_transport_deliver_tap_pkt(pkt);
>   			virtio_transport_recv_pkt(pkt);
>   		}
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 3ae3a33..7bef1d5 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -272,14 +272,49 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk,
>   		 */
>   		spin_unlock_bh(&vvs->rx_lock);
>
> -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> -		if (err)
> -			goto out;
> +		if (pkt->mergeable) {
> +			struct virtio_vsock_mrg_rxbuf *buf = pkt->mrg_rxbuf;
> +			size_t mrg_copy_bytes, last_buf_total = 0, rxbuf_off;
> +			size_t tmp_bytes = bytes;
> +			int i;
> +
> +			for (i = 0; i < le16_to_cpu(pkt->mrg_rxbuf_hdr.num_buffers); i++) {
> +				if (pkt->off > last_buf_total + buf[i].len) {
> +					last_buf_total += buf[i].len;
> +					continue;
> +				}
> +
> +				rxbuf_off = pkt->off - last_buf_total;
> +				mrg_copy_bytes = min(buf[i].len - rxbuf_off, tmp_bytes);
> +				err = memcpy_to_msg(msg, buf[i].buf + rxbuf_off, mrg_copy_bytes);
> +				if (err)
> +					goto out;
> +
> +				tmp_bytes -= mrg_copy_bytes;
> +				pkt->off += mrg_copy_bytes;
> +				last_buf_total += buf[i].len;
> +				if (!tmp_bytes)
> +					break;
> +			}


After switch to use iov, you can user iov_iter helper to avoid the above 
open-coding I believe.

And you can also drop the if (mergeable) condition.

Thanks


> +
> +			if (tmp_bytes) {
> +				printk(KERN_WARNING "WARNING! bytes = %llu, "
> +						"bytes = %llu\n",
> +						(unsigned long long)bytes,
> +						(unsigned long long)tmp_bytes);
> +			}
> +
> +			total += (bytes - tmp_bytes);
> +		} else {
> +			err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> +			if (err)
> +				goto out;
> +
> +			total += bytes;
> +			pkt->off += bytes;
> +		}
>
>   		spin_lock_bh(&vvs->rx_lock);
> -
> -		total += bytes;
> -		pkt->off += bytes;
>   		if (pkt->off == pkt->len) {
>   			virtio_transport_dec_rx_pkt(vvs, pkt);
>   			list_del(&pkt->list);
> @@ -1050,8 +1085,16 @@ void virtio_transport_recv_pkt(struct virtio_vsock_pkt *pkt)
>
>   void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
>   {
> -	kfree(pkt->buf);
> -	kfree(pkt);
> +	int i;
> +
> +	if (pkt->mergeable) {
> +		for (i = 1; i < le16_to_cpu(pkt->mrg_rxbuf_hdr.num_buffers); i++)
> +			free_page((unsigned long)pkt->mrg_rxbuf[i].buf);
> +		free_page((unsigned long)(void *)pkt);
> +	} else {
> +		kfree(pkt->buf);
> +		kfree(pkt);
> +	}
>   }
>   EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 2/5] VSOCK: support fill data to mergeable rx buffer in host
From: Jason Wang @ 2018-11-06  3:43 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BDFF537.3050806@huawei.com>


On 2018/11/5 下午3:45, jiangyiwen wrote:
> When vhost support VIRTIO_VSOCK_F_MRG_RXBUF feature,
> it will merge big packet into rx vq.
>
> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>   drivers/vhost/vsock.c             | 117 +++++++++++++++++++++++++++++++-------
>   include/linux/virtio_vsock.h      |   1 +
>   include/uapi/linux/virtio_vsock.h |   5 ++
>   3 files changed, 102 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 34bc3ab..648be39 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -22,7 +22,8 @@
>   #define VHOST_VSOCK_DEFAULT_HOST_CID	2
>
>   enum {
> -	VHOST_VSOCK_FEATURES = VHOST_FEATURES,
> +	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> +			(1ULL << VIRTIO_VSOCK_F_MRG_RXBUF),
>   };
>
>   /* Used to track all the vhost_vsock instances on the system. */
> @@ -80,6 +81,68 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>   	return vsock;
>   }
>
> +static int get_rx_bufs(struct vhost_virtqueue *vq,
> +		struct vring_used_elem *heads, int datalen,
> +		unsigned *iovcount, unsigned int quota)
> +{
> +	unsigned int out, in;
> +	int seg = 0;
> +	int headcount = 0;
> +	unsigned d;
> +	int ret;
> +	/*
> +	 * len is always initialized before use since we are always called with
> +	 * datalen > 0.
> +	 */
> +	u32 uninitialized_var(len);
> +
> +	while (datalen > 0 && headcount < quota) {
> +		if (unlikely(seg >= UIO_MAXIOV)) {
> +			ret = -ENOBUFS;
> +			goto err;
> +		}
> +
> +		ret = vhost_get_vq_desc(vq, vq->iov + seg,
> +				ARRAY_SIZE(vq->iov) - seg, &out,
> +				&in, NULL, NULL);
> +		if (unlikely(ret < 0))
> +			goto err;
> +
> +		d = ret;
> +		if (d == vq->num) {
> +			ret = 0;
> +			goto err;
> +		}
> +
> +		if (unlikely(out || in <= 0)) {
> +			vq_err(vq, "unexpected descriptor format for RX: "
> +					"out %d, in %d\n", out, in);
> +			ret = -EINVAL;
> +			goto err;
> +		}
> +
> +		heads[headcount].id = cpu_to_vhost32(vq, d);
> +		len = iov_length(vq->iov + seg, in);
> +		heads[headcount].len = cpu_to_vhost32(vq, len);
> +		datalen -= len;
> +		++headcount;
> +		seg += in;
> +	}
> +
> +	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
> +	*iovcount = seg;
> +
> +	/* Detect overrun */
> +	if (unlikely(datalen > 0)) {
> +		ret = UIO_MAXIOV + 1;
> +		goto err;
> +	}
> +	return headcount;
> +err:
> +	vhost_discard_vq_desc(vq, headcount);
> +	return ret;
> +}


Seems duplicated with the one used by vhost-net.

In packed virtqueue implementation, I plan to move this to vhost.c.


> +
>   static void
>   vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>   			    struct vhost_virtqueue *vq)
> @@ -87,22 +150,34 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>   	struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
>   	bool added = false;
>   	bool restart_tx = false;
> +	int mergeable;
> +	size_t vsock_hlen;
>
>   	mutex_lock(&vq->mutex);
>
>   	if (!vq->private_data)
>   		goto out;
>
> +	mergeable = vhost_has_feature(vq, VIRTIO_VSOCK_F_MRG_RXBUF);
> +	/*
> +	 * Guest fill page for rx vq in mergeable case, so it will not
> +	 * allocate pkt structure, we should reserve size of pkt in advance.
> +	 */
> +	if (likely(mergeable))
> +		vsock_hlen = sizeof(struct virtio_vsock_pkt);
> +	else
> +		vsock_hlen = sizeof(struct virtio_vsock_hdr);
> +
>   	/* Avoid further vmexits, we're already processing the virtqueue */
>   	vhost_disable_notify(&vsock->dev, vq);
>
>   	for (;;) {
>   		struct virtio_vsock_pkt *pkt;
>   		struct iov_iter iov_iter;
> -		unsigned out, in;
> +		unsigned out = 0, in = 0;
>   		size_t nbytes;
>   		size_t len;
> -		int head;
> +		s16 headcount;
>
>   		spin_lock_bh(&vsock->send_pkt_list_lock);
>   		if (list_empty(&vsock->send_pkt_list)) {
> @@ -116,16 +191,9 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>   		list_del_init(&pkt->list);
>   		spin_unlock_bh(&vsock->send_pkt_list_lock);
>
> -		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
> -					 &out, &in, NULL, NULL);
> -		if (head < 0) {
> -			spin_lock_bh(&vsock->send_pkt_list_lock);
> -			list_add(&pkt->list, &vsock->send_pkt_list);
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> -			break;
> -		}
> -
> -		if (head == vq->num) {
> +		headcount = get_rx_bufs(vq, vq->heads, vsock_hlen + pkt->len,
> +				&in, likely(mergeable) ? UIO_MAXIOV : 1);
> +		if (headcount <= 0) {
>   			spin_lock_bh(&vsock->send_pkt_list_lock);
>   			list_add(&pkt->list, &vsock->send_pkt_list);
>   			spin_unlock_bh(&vsock->send_pkt_list_lock);
> @@ -133,19 +201,13 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>   			/* We cannot finish yet if more buffers snuck in while
>   			 * re-enabling notify.
>   			 */
> -			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
> +			if (!headcount && unlikely(vhost_enable_notify(&vsock->dev, vq))) {
>   				vhost_disable_notify(&vsock->dev, vq);
>   				continue;
>   			}
>   			break;
>   		}
>
> -		if (out) {
> -			virtio_transport_free_pkt(pkt);
> -			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
> -			break;
> -		}
> -
>   		len = iov_length(&vq->iov[out], in);
>   		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
>
> @@ -156,6 +218,19 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>   			break;
>   		}
>
> +		if (likely(mergeable)) {
> +			pkt->mrg_rxbuf_hdr.num_buffers = cpu_to_le16(headcount);
> +			nbytes = copy_to_iter(&pkt->mrg_rxbuf_hdr,
> +					sizeof(pkt->mrg_rxbuf_hdr), &iov_iter);
> +			if (nbytes != sizeof(pkt->mrg_rxbuf_hdr)) {
> +				virtio_transport_free_pkt(pkt);
> +				vq_err(vq, "Faulted on copying rxbuf hdr\n");
> +				break;
> +			}
> +			iov_iter_advance(&iov_iter, (vsock_hlen -
> +					sizeof(pkt->mrg_rxbuf_hdr) - sizeof(pkt->hdr)));
> +		}
> +
>   		nbytes = copy_to_iter(pkt->buf, pkt->len, &iov_iter);
>   		if (nbytes != pkt->len) {
>   			virtio_transport_free_pkt(pkt);
> @@ -163,7 +238,7 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
>   			break;
>   		}
>
> -		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
> +		vhost_add_used_n(vq, vq->heads, headcount);
>   		added = true;


Looks rather similar to vhost-net mergeable rx buffer implementation. 
Another proof of using vhost-net.

Thanks


>
>   		if (pkt->reply) {
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index bf84418..da9e1fe 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -50,6 +50,7 @@ struct virtio_vsock_sock {
>
>   struct virtio_vsock_pkt {
>   	struct virtio_vsock_hdr	hdr;
> +	struct virtio_vsock_mrg_rxbuf_hdr mrg_rxbuf_hdr;
>   	struct work_struct work;
>   	struct list_head list;
>   	/* socket refcnt not held, only use for cancellation */
> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> index 1d57ed3..2292f30 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -63,6 +63,11 @@ struct virtio_vsock_hdr {
>   	__le32	fwd_cnt;
>   } __attribute__((packed));
>
> +/* It add mergeable rx buffers feature */
> +struct virtio_vsock_mrg_rxbuf_hdr {
> +	__le16  num_buffers;    /* number of mergeable rx buffers */
> +} __attribute__((packed));
> +
>   enum virtio_vsock_type {
>   	VIRTIO_VSOCK_TYPE_STREAM = 1,
>   };
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/5] VSOCK: support fill mergeable rx buffer in guest
From: Jason Wang @ 2018-11-06  3:38 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BDFF4FE.1080500@huawei.com>


On 2018/11/5 下午3:45, jiangyiwen wrote:
> In driver probing, if virtio has VIRTIO_VSOCK_F_MRG_RXBUF feature,
> it will fill mergeable rx buffer, support for host send mergeable
> rx buffer. It will fill a page everytime to compact with small
> packet and big packet.
>
> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> ---
>   include/linux/virtio_vsock.h     |  3 ++
>   net/vmw_vsock/virtio_transport.c | 72 +++++++++++++++++++++++++++++-----------
>   2 files changed, 56 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index e223e26..bf84418 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -14,6 +14,9 @@
>   #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>   #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
>
> +/* Virtio-vsock feature */
> +#define VIRTIO_VSOCK_F_MRG_RXBUF 0 /* Host can merge receive buffers. */
> +
>   enum {
>   	VSOCK_VQ_RX     = 0, /* for host to guest data */
>   	VSOCK_VQ_TX     = 1, /* for guest to host data */
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 5d3cce9..2040a9e 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -64,6 +64,7 @@ struct virtio_vsock {
>   	struct virtio_vsock_event event_list[8];
>
>   	u32 guest_cid;
> +	bool mergeable;
>   };
>
>   static struct virtio_vsock *virtio_vsock_get(void)
> @@ -256,6 +257,25 @@ static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
>   	return 0;
>   }
>
> +static int fill_mergeable_rx_buff(struct virtqueue *vq)
> +{
> +	void *page = NULL;
> +	struct scatterlist sg;
> +	int err;
> +
> +	page = (void *)get_zeroed_page(GFP_KERNEL);


Any reason to use zeroed page?


> +	if (!page)
> +		return -ENOMEM;
> +
> +	sg_init_one(&sg, page, PAGE_SIZE);


FYI, for virtio-net we have several optimizations for mergeable rx buffer:

- skb_page_frag_refill() which can use high order page and reduce the 
stress of page allocator

- we don't use fixed buffer size, instead we use EWMA to estimate the 
possible rx buffer size to avoid internal fragmentation


If we can try to reuse virtio-net driver, we will get those nice features.


Thanks


> +
> +	err = virtqueue_add_inbuf(vq, &sg, 1, page, GFP_KERNEL);
> +	if (err < 0)
> +		free_page((unsigned long) page);
> +
> +	return err;
> +}
> +
>   static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>   {
>   	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> @@ -267,27 +287,33 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>   	vq = vsock->vqs[VSOCK_VQ_RX];
>
>   	do {
> -		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> -		if (!pkt)
> -			break;
> +		if (vsock->mergeable) {
> +			ret = fill_mergeable_rx_buff(vq);
> +			if (ret)
> +				break;
> +		} else {
> +			pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> +			if (!pkt)
> +				break;
>
> -		pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> -		if (!pkt->buf) {
> -			virtio_transport_free_pkt(pkt);
> -			break;
> -		}
> +			pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> +			if (!pkt->buf) {
> +				virtio_transport_free_pkt(pkt);
> +				break;
> +			}
>
> -		pkt->len = buf_len;
> +			pkt->len = buf_len;
>
> -		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> -		sgs[0] = &hdr;
> +			sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> +			sgs[0] = &hdr;
>
> -		sg_init_one(&buf, pkt->buf, buf_len);
> -		sgs[1] = &buf;
> -		ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
> -		if (ret) {
> -			virtio_transport_free_pkt(pkt);
> -			break;
> +			sg_init_one(&buf, pkt->buf, buf_len);
> +			sgs[1] = &buf;
> +			ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
> +			if (ret) {
> +				virtio_transport_free_pkt(pkt);
> +				break;
> +			}
>   		}
>   		vsock->rx_buf_nr++;
>   	} while (vq->num_free);
> @@ -588,6 +614,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>   	if (ret < 0)
>   		goto out_vqs;
>
> +	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_MRG_RXBUF))
> +		vsock->mergeable = true;
> +
>   	vsock->rx_buf_nr = 0;
>   	vsock->rx_buf_max_nr = 0;
>   	atomic_set(&vsock->queued_replies, 0);
> @@ -640,8 +669,12 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
>   	vdev->config->reset(vdev);
>
>   	mutex_lock(&vsock->rx_lock);
> -	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX])))
> -		virtio_transport_free_pkt(pkt);
> +	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX]))) {
> +		if (vsock->mergeable)
> +			free_page((unsigned long)(void *)pkt);
> +		else
> +			virtio_transport_free_pkt(pkt);
> +	}
>   	mutex_unlock(&vsock->rx_lock);
>
>   	mutex_lock(&vsock->tx_lock);
> @@ -683,6 +716,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
>   };
>
>   static unsigned int features[] = {
> +	VIRTIO_VSOCK_F_MRG_RXBUF,
>   };
>
>   static struct virtio_driver virtio_vsock_driver = {
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
From: Jason Wang @ 2018-11-06  3:32 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE107B5.2050900@huawei.com>


On 2018/11/6 上午11:17, jiangyiwen wrote:
> On 2018/11/6 10:41, Jason Wang wrote:
>> On 2018/11/6 上午10:17, jiangyiwen wrote:
>>> On 2018/11/5 17:21, Jason Wang wrote:
>>>> On 2018/11/5 下午3:43, jiangyiwen wrote:
>>>>> Now vsock only support send/receive small packet, it can't achieve
>>>>> high performance. As previous discussed with Jason Wang, I revisit the
>>>>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>>>>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>>>>> into different buffers and improve performance obviously.
>>>>>
>>>>> I write a tool to test the vhost-vsock performance, mainly send big
>>>>> packet(64K) included guest->Host and Host->Guest. The result as
>>>>> follows:
>>>>>
>>>>> Before performance:
>>>>>                  Single socket            Multiple sockets(Max Bandwidth)
>>>>> Guest->Host   ~400MB/s                 ~480MB/s
>>>>> Host->Guest   ~1450MB/s                ~1600MB/s
>>>>>
>>>>> After performance:
>>>>>                  Single socket            Multiple sockets(Max Bandwidth)
>>>>> Guest->Host   ~1700MB/s                ~2900MB/s
>>>>> Host->Guest   ~1700MB/s                ~2900MB/s
>>>>>
>>>>>    From the test results, the performance is improved obviously, and guest
>>>>> memory will not be wasted.
>>>> Hi:
>>>>
>>>> Thanks for the patches and the numbers are really impressive.
>>>>
>>>> But instead of duplicating codes between sock and net. I was considering to use virtio-net as a transport of vsock. Then we may have all existed features likes batching, mergeable rx buffers and multiqueue. Want to consider this idea? Thoughts?
>>>>
>>>>
>>> Hi Jason,
>>>
>>> I am not very familiar with virtio-net, so I am afraid I can't give too
>>> much effective advice. Then I have several problems:
>>>
>>> 1. If use virtio-net as a transport, guest should see a virtio-net
>>> device instead of virtio-vsock device, right? Is vsock only as a
>>> transport between socket and net_device? User should still use
>>> AF_VSOCK type to create socket, right?
>>
>> Well, there're many choices. What you need is just to keep the socket API and hide the implementation. For example, you can keep the vosck device in guest and switch to use vhost-net in host. We probably need a new feature bit or header to let vhost know we are passing vsock packet. And vhost-net could forward the packet to vsock core on host.
>>
>>
>>> 2. I want to know if this idea has already started, and how is
>>> the current progress?
>>
>> Not yet started.  Just want to listen from the community. If this sounds good, do you have interest in implementing this?
>>
>>
>>> 3. And what is stefan's idea?
>>
>> Talk with Stefan a little on this during KVM Forum. I think he tends to agree on this idea. Anyway, let's wait for his reply.
>>
>>
>> Thanks
>>
>>
> Hi Jason,
>
> Thanks your reply, what you want is try to avoid duplicate code, and still
> use the existed features with virtio-net.


Yes, technically we can use virtio-net driver is guest as well but we 
could do it step by step.


> Yes, if this sounds good and most people can recognize this idea, I am very
> happy to implement this.


Cool, thanks.


>
> In addition, I hope you can review these patches before the new idea is
> implemented, after all the performance can be improved. :-)


Ok.


So the patch actually did three things:

- mergeable buffer implementation

- increase the default rx buffer size

- add used and signal guest in a batch

It would be helpful if you can measure the performance improvement 
independently. This can give reviewer a better understanding on how much 
did each part help.

Thanks


>
> Thanks,
> Yiwen.
>
>>> Thanks,
>>> Yiwen.
>>>
>> .
>>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
From: jiangyiwen @ 2018-11-06  3:17 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <b4ce06c8-1c4b-724f-52e8-cb8d7b23fd27@redhat.com>

On 2018/11/6 10:41, Jason Wang wrote:
> 
> On 2018/11/6 上午10:17, jiangyiwen wrote:
>> On 2018/11/5 17:21, Jason Wang wrote:
>>> On 2018/11/5 下午3:43, jiangyiwen wrote:
>>>> Now vsock only support send/receive small packet, it can't achieve
>>>> high performance. As previous discussed with Jason Wang, I revisit the
>>>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>>>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>>>> into different buffers and improve performance obviously.
>>>>
>>>> I write a tool to test the vhost-vsock performance, mainly send big
>>>> packet(64K) included guest->Host and Host->Guest. The result as
>>>> follows:
>>>>
>>>> Before performance:
>>>>                 Single socket            Multiple sockets(Max Bandwidth)
>>>> Guest->Host   ~400MB/s                 ~480MB/s
>>>> Host->Guest   ~1450MB/s                ~1600MB/s
>>>>
>>>> After performance:
>>>>                 Single socket            Multiple sockets(Max Bandwidth)
>>>> Guest->Host   ~1700MB/s                ~2900MB/s
>>>> Host->Guest   ~1700MB/s                ~2900MB/s
>>>>
>>>>   From the test results, the performance is improved obviously, and guest
>>>> memory will not be wasted.
>>> Hi:
>>>
>>> Thanks for the patches and the numbers are really impressive.
>>>
>>> But instead of duplicating codes between sock and net. I was considering to use virtio-net as a transport of vsock. Then we may have all existed features likes batching, mergeable rx buffers and multiqueue. Want to consider this idea? Thoughts?
>>>
>>>
>> Hi Jason,
>>
>> I am not very familiar with virtio-net, so I am afraid I can't give too
>> much effective advice. Then I have several problems:
>>
>> 1. If use virtio-net as a transport, guest should see a virtio-net
>> device instead of virtio-vsock device, right? Is vsock only as a
>> transport between socket and net_device? User should still use
>> AF_VSOCK type to create socket, right?
> 
> 
> Well, there're many choices. What you need is just to keep the socket API and hide the implementation. For example, you can keep the vosck device in guest and switch to use vhost-net in host. We probably need a new feature bit or header to let vhost know we are passing vsock packet. And vhost-net could forward the packet to vsock core on host.
> 
> 
>>
>> 2. I want to know if this idea has already started, and how is
>> the current progress?
> 
> 
> Not yet started.  Just want to listen from the community. If this sounds good, do you have interest in implementing this?
> 
> 
>>
>> 3. And what is stefan's idea?
> 
> 
> Talk with Stefan a little on this during KVM Forum. I think he tends to agree on this idea. Anyway, let's wait for his reply.
> 
> 
> Thanks
> 
> 

Hi Jason,

Thanks your reply, what you want is try to avoid duplicate code, and still
use the existed features with virtio-net.
Yes, if this sounds good and most people can recognize this idea, I am very
happy to implement this.

In addition, I hope you can review these patches before the new idea is
implemented, after all the performance can be improved. :-)

Thanks,
Yiwen.

>>
>> Thanks,
>> Yiwen.
>>
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/1] vhost: add per-vq worker thread
From: Jason Wang @ 2018-11-06  2:48 UTC (permalink / raw)
  To: Vitaly Mayatskih
  Cc: netdev, virtualization, linux-kernel, kvm, Michael S . Tsirkin
In-Reply-To: <CAGF4SLgDk+48aLKHhA_ZgRc6D30tGdnB89b5m5bZKwzyoDb0dQ@mail.gmail.com>


On 2018/11/5 上午11:28, Vitaly Mayatskih wrote:
> On Sun, Nov 4, 2018 at 9:53 PM Jason Wang <jasowang@redhat.com> wrote:
>
>> I wonder whether or not it's better to allow the device to specific the
>> worker here instead of forcing a per vq worker model. Then we can keep
>> the behavior of exist implementation and do optimization on top?
> I was thinking about that too, but for the sake of simplicity it
> sounds valid that if the user wanted 8 parallel queues for the disk,
> they better be parallel, i.e. worker per queue. The rest of disks that
> don't need high-performance, can have 1 queue specified.
>

If you allow device to specify the worker itself, you can do any kinds 
of mapping bettween work and worker kthread I think. The advantage of 
doing this is that you can keep the vhost-net untouched. This makes 
things a little bit easier and proving two kthreads is better than one 
for -net workload is probably not as easy as it looks. We may get boost 
in some cases but degradation for the rest.


Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/1] vhost: add vhost_blk driver
From: Jason Wang @ 2018-11-06  2:45 UTC (permalink / raw)
  To: Vitaly Mayatskih
  Cc: kvm, Michael S . Tsirkin, linux-kernel, virtualization, stefanha,
	Paolo Bonzini
In-Reply-To: <CAGF4SLg6+nt+zV4haXys4Wz2dWPDyx7xHXuA+h01NcQ94j0E1A@mail.gmail.com>


On 2018/11/5 上午11:23, Vitaly Mayatskih wrote:
> On Sun, Nov 4, 2018 at 10:00 PM Jason Wang <jasowang@redhat.com> wrote:
>
>>> # fio num-jobs
>>> # A: bare metal over block
>>> # B: bare metal over file
>>> # C: virtio-blk over block
>>> # D: virtio-blk over file
>>> # E: vhost-blk bio over block
>>> # F: vhost-blk kiocb over block
>>> # G: vhost-blk kiocb over file
>>> #
>>> #  A     B     C    D    E     F     G
>>> 16 1480k 1506k 101k 102k 1346k 1202k 566k
>> Hi:
>>
>> Thanks for the patches.
>>
>> This is not the first attempt for having vhost-blk:
>>
>> - Badari's version: https://lwn.net/Articles/379864/
>>
>> - Asias' version: https://lwn.net/Articles/519880/
>>
>> It's better to describe the differences (kiocb vs bio? performance?).
>> E.g if my memory is correct, Asias said it doesn't give much improvement
>> compared with userspace qemu.
>>
>> And what's more important, I believe we tend to use virtio-scsi nowdays.
>> So what's the advantages of vhost-blk over vhost-scsi?
> Hi,
>
> Yes, I saw both. Frankly, my implementation is not that different,
> because the whole thing has only twice more LOC that vhost/test.c.
>
> I posted my numbers (see in quoted text above the 16 queues case),
> IOPS goes from ~100k to 1.2M and almost reaches the physical
> limitation of the backend.
>
> submit_bio() is a bit faster, but can't be used for disk images placed
> on a file system. I have that submit_bio implementation too.
>
> Storage industry is shifting away from SCSI, which has a scaling
> problem.


Know little about storage. For scaling, do you mean SCSI protocol 
itself? If not, it's probably not a real issue for virtio-scsi itself.


>   I can compare vhost-scsi vs vhost-blk if you are curious.


It would be very helpful to see the performance comparison.


Thanks


> Thanks!
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
From: Jason Wang @ 2018-11-06  2:41 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BE0F9C9.2080003@huawei.com>


On 2018/11/6 上午10:17, jiangyiwen wrote:
> On 2018/11/5 17:21, Jason Wang wrote:
>> On 2018/11/5 下午3:43, jiangyiwen wrote:
>>> Now vsock only support send/receive small packet, it can't achieve
>>> high performance. As previous discussed with Jason Wang, I revisit the
>>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>>> into different buffers and improve performance obviously.
>>>
>>> I write a tool to test the vhost-vsock performance, mainly send big
>>> packet(64K) included guest->Host and Host->Guest. The result as
>>> follows:
>>>
>>> Before performance:
>>>                 Single socket            Multiple sockets(Max Bandwidth)
>>> Guest->Host   ~400MB/s                 ~480MB/s
>>> Host->Guest   ~1450MB/s                ~1600MB/s
>>>
>>> After performance:
>>>                 Single socket            Multiple sockets(Max Bandwidth)
>>> Guest->Host   ~1700MB/s                ~2900MB/s
>>> Host->Guest   ~1700MB/s                ~2900MB/s
>>>
>>>   From the test results, the performance is improved obviously, and guest
>>> memory will not be wasted.
>> Hi:
>>
>> Thanks for the patches and the numbers are really impressive.
>>
>> But instead of duplicating codes between sock and net. I was considering to use virtio-net as a transport of vsock. Then we may have all existed features likes batching, mergeable rx buffers and multiqueue. Want to consider this idea? Thoughts?
>>
>>
> Hi Jason,
>
> I am not very familiar with virtio-net, so I am afraid I can't give too
> much effective advice. Then I have several problems:
>
> 1. If use virtio-net as a transport, guest should see a virtio-net
> device instead of virtio-vsock device, right? Is vsock only as a
> transport between socket and net_device? User should still use
> AF_VSOCK type to create socket, right?


Well, there're many choices. What you need is just to keep the socket 
API and hide the implementation. For example, you can keep the vosck 
device in guest and switch to use vhost-net in host. We probably need a 
new feature bit or header to let vhost know we are passing vsock packet. 
And vhost-net could forward the packet to vsock core on host.


>
> 2. I want to know if this idea has already started, and how is
> the current progress?


Not yet started.  Just want to listen from the community. If this sounds 
good, do you have interest in implementing this?


>
> 3. And what is stefan's idea?


Talk with Stefan a little on this during KVM Forum. I think he tends to 
agree on this idea. Anyway, let's wait for his reply.


Thanks


>
> Thanks,
> Yiwen.
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 0/5] VSOCK: support mergeable rx buffer in vhost-vsock
From: jiangyiwen @ 2018-11-06  2:17 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <b9d535f8-ddc3-a4bc-21c9-ca21e808f0d1@redhat.com>

On 2018/11/5 17:21, Jason Wang wrote:
> 
> On 2018/11/5 下午3:43, jiangyiwen wrote:
>> Now vsock only support send/receive small packet, it can't achieve
>> high performance. As previous discussed with Jason Wang, I revisit the
>> idea of vhost-net about mergeable rx buffer and implement the mergeable
>> rx buffer in vhost-vsock, it can allow big packet to be scattered in
>> into different buffers and improve performance obviously.
>>
>> I write a tool to test the vhost-vsock performance, mainly send big
>> packet(64K) included guest->Host and Host->Guest. The result as
>> follows:
>>
>> Before performance:
>>                Single socket            Multiple sockets(Max Bandwidth)
>> Guest->Host   ~400MB/s                 ~480MB/s
>> Host->Guest   ~1450MB/s                ~1600MB/s
>>
>> After performance:
>>                Single socket            Multiple sockets(Max Bandwidth)
>> Guest->Host   ~1700MB/s                ~2900MB/s
>> Host->Guest   ~1700MB/s                ~2900MB/s
>>
>>  From the test results, the performance is improved obviously, and guest
>> memory will not be wasted.
> 
> 
> Hi:
> 
> Thanks for the patches and the numbers are really impressive.
> 
> But instead of duplicating codes between sock and net. I was considering to use virtio-net as a transport of vsock. Then we may have all existed features likes batching, mergeable rx buffers and multiqueue. Want to consider this idea? Thoughts?
> 
> 

Hi Jason,

I am not very familiar with virtio-net, so I am afraid I can't give too
much effective advice. Then I have several problems:

1. If use virtio-net as a transport, guest should see a virtio-net
device instead of virtio-vsock device, right? Is vsock only as a
transport between socket and net_device? User should still use
AF_VSOCK type to create socket, right?

2. I want to know if this idea has already started, and how is
the current progress?

3. And what is stefan's idea?

Thanks,
Yiwen.

>>
>> ---
>>
>> Yiwen Jiang (5):
>>    VSOCK: support fill mergeable rx buffer in guest
>>    VSOCK: support fill data to mergeable rx buffer in host
>>    VSOCK: support receive mergeable rx buffer in guest
>>    VSOCK: modify default rx buf size to improve performance
>>    VSOCK: batch sending rx buffer to increase bandwidth
>>
>>   drivers/vhost/vsock.c                   | 135 +++++++++++++++++++++++------
>>   include/linux/virtio_vsock.h            |  15 +++-
>>   include/uapi/linux/virtio_vsock.h       |   5 ++
>>   net/vmw_vsock/virtio_transport.c        | 147 ++++++++++++++++++++++++++------
>>   net/vmw_vsock/virtio_transport_common.c |  59 +++++++++++--
>>   5 files changed, 300 insertions(+), 61 deletions(-)
>>
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v4 3/4] drm/virtio: add in/out fence support for explicit synchronization
From: Emil Velikov @ 2018-11-05 17:25 UTC (permalink / raw)
  To: Robert Foss
  Cc: Rob Herring, David Airlie, Linux-Kernel@Vger. Kernel. Org,
	ML dri-devel, open list:VIRTIO GPU DRIVER, Gustavo Padovan,
	Emil Velikov
In-Reply-To: <20181105114152.2088-4-robert.foss@collabora.com>

On Mon, 5 Nov 2018 at 11:42, Robert Foss <robert.foss@collabora.com> wrote:
>
> When the execbuf call receives an in-fence it will get the dma_fence
> related to that fence fd and wait on it before submitting the draw call.
>
> On the out-fence side we get fence returned by the submitted draw call
> and attach it to a sync_file and send the sync_file fd to userspace. On
> error -1 is returned to userspace.
>
> Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
> Signed-off-by: Robert Foss <robert.foss@collabora.com>
> Suggested-by: Rob Herring <robh@kernel.org>
> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
> ---
>
> Changes since v3:
>  - Move all in_fence handling to the same VIRTGPU_EXECBUF_FENCE_FD_IN block
Fwiw my suggestion was to explicitly document whether the IOCTL can
support, simultaneously, IN and OUT fence.
Merging the two patches makes things a bit meh. But as before - it's
for Gerd to make the final call.

-Emil

^ permalink raw reply

* Re: [PATCH 0/1] vhost: add vhost_blk driver
From: Christian Borntraeger @ 2018-11-05 15:48 UTC (permalink / raw)
  To: Jason Wang, Vitaly Mayatskikh, Michael S . Tsirkin
  Cc: Paolo Bonzini, virtualization, linux-kernel, Stefan Hajnoczi, kvm
In-Reply-To: <6a7f1668-bf2d-0caa-2efd-c8fab5f1db24@redhat.com>



On 11/05/2018 04:00 AM, Jason Wang wrote:
> 
> On 2018/11/3 上午2:21, Vitaly Mayatskikh wrote:
>> vhost_blk is a host-side kernel mode accelerator for virtio-blk. The
>> driver allows VM to reach a near bare-metal disk performance. See IOPS
>> numbers below (fio --rw=randread --bs=4k).
>>
>> This implementation uses kiocb interface. It is slightly slower than
>> going directly through bio, but is simpler and also works with disk
>> images placed on a file system.
>>
>> # fio num-jobs
>> # A: bare metal over block
>> # B: bare metal over file
>> # C: virtio-blk over block
>> # D: virtio-blk over file
>> # E: vhost-blk bio over block
>> # F: vhost-blk kiocb over block
>> # G: vhost-blk kiocb over file
>> #
>> #  A     B     C    D    E     F     G
>>
>> 1  171k  151k  148k 151k 195k  187k  175k
>> 2  328k  302k  249k 241k 349k  334k  296k
>> 3  479k  437k  179k 174k 501k  464k  404k
>> 4  622k  568k  143k 183k 620k  580k  492k
>> 5  755k  697k  136k 128k 737k  693k  579k
>> 6  887k  808k  131k 120k 830k  782k  640k
>> 7  1004k 926k  126k 131k 926k  863k  693k
>> 8  1099k 1015k 117k 115k 1001k 931k  712k
>> 9  1194k 1119k 115k 111k 1055k 991k  711k
>> 10 1278k 1207k 109k 114k 1130k 1046k 695k
>> 11 1345k 1280k 110k 108k 1119k 1091k 663k
>> 12 1411k 1356k 104k 106k 1201k 1142k 629k
>> 13 1466k 1423k 106k 106k 1260k 1170k 607k
>> 14 1517k 1486k 103k 106k 1296k 1179k 589k
>> 15 1552k 1543k 102k 102k 1322k 1191k 571k
>> 16 1480k 1506k 101k 102k 1346k 1202k 566k
>>
>> Vitaly Mayatskikh (1):
>>    Add vhost_blk driver
>>
>>   drivers/vhost/Kconfig  |  13 ++
>>   drivers/vhost/Makefile |   3 +
>>   drivers/vhost/blk.c    | 510 +++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 526 insertions(+)
>>   create mode 100644 drivers/vhost/blk.c
>>
> 
> Hi:
> 
> Thanks for the patches.
> 
> This is not the first attempt for having vhost-blk:
> 
> - Badari's version: https://lwn.net/Articles/379864/
> 
> - Asias' version: https://lwn.net/Articles/519880/
> 
> It's better to describe the differences (kiocb vs bio? performance?). E.g if my memory is correct, Asias said it doesn't give much improvement compared with userspace qemu.
> 
> And what's more important, I believe we tend to use virtio-scsi nowdays. So what's the advantages of vhost-blk over vhost-scsi?


For the record, we still do use virtio-blk a lot. As we see new things like discard/write zero 
support it seems that others do as well.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox