($INBOX_DIR/description missing)
 help / color / mirror / Atom feed
* [virtio-dev] Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc
       [not found] <cover.1660362668.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:29 ` Bobby Eshleman
       [not found] ` <65d117ddc530d12a6d47fcc45b38891465a90d9f.1660362668.git.bobby.eshleman@bytedance.com>
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:29 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Dexuan Cui, kvm, virtualization, netdev, linux-kernel,
	linux-hyperv

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote:
> Hey everybody,
> 
> This series introduces datagrams, packet scheduling, and sk_buff usage
> to virtio vsock.
> 
> The usage of struct sk_buff benefits users by a) preparing vsock to use
> other related systems that require sk_buff, such as sockmap and qdisc,
> b) supporting basic congestion control via sock_alloc_send_skb, and c)
> reducing copying when delivering packets to TAP.
> 
> The socket layer no longer forces errors to be -ENOMEM, as typically
> userspace expects -EAGAIN when the sk_sndbuf threshold is reached and
> messages are being sent with option MSG_DONTWAIT.
> 
> The datagram work is based off previous patches by Jiang Wang[1].
> 
> The introduction of datagrams creates a transport layer fairness issue
> where datagrams may freely starve streams of queue access. This happens
> because, unlike streams, datagrams lack the transactions necessary for
> calculating credits and throttling.
> 
> Previous proposals introduce changes to the spec to add an additional
> virtqueue pair for datagrams[1]. Although this solution works, using
> Linux's qdisc for packet scheduling leverages already existing systems,
> avoids the need to change the virtio specification, and gives additional
> capabilities. The usage of SFQ or fq_codel, for example, may solve the
> transport layer starvation problem. It is easy to imagine other use
> cases as well. For example, services of varying importance may be
> assigned different priorities, and qdisc will apply appropriate
> priority-based scheduling. By default, the system default pfifo qdisc is
> used. The qdisc may be bypassed and legacy queuing is resumed by simply
> setting the virtio-vsock%d network device to state DOWN. This technique
> still allows vsock to work with zero-configuration.
> 
> In summary, this series introduces these major changes to vsock:
> 
> - virtio vsock supports datagrams
> - virtio vsock uses struct sk_buff instead of virtio_vsock_pkt
>   - Because virtio vsock uses sk_buff, it also uses sock_alloc_send_skb,
>     which applies the throttling threshold sk_sndbuf.
> - The vsock socket layer supports returning errors other than -ENOMEM.
>   - This is used to return -EAGAIN when the sk_sndbuf threshold is
>     reached.
> - virtio vsock uses a net_device, through which qdisc may be used.
>  - qdisc allows scheduling policies to be applied to vsock flows.
>   - Some qdiscs, like SFQ, may allow vsock to avoid transport layer congestion. That is,
>     it may avoid datagrams from flooding out stream flows. The benefit
>     to this is that additional virtqueues are not needed for datagrams.
>   - The net_device and qdisc is bypassed by simply setting the
>     net_device state to DOWN.
> 
> [1]: https://lore.kernel.org/all/20210914055440.3121004-1-jiang.wang@bytedance.com/
> 
> Bobby Eshleman (5):
>   vsock: replace virtio_vsock_pkt with sk_buff
>   vsock: return errors other than -ENOMEM to socket
>   vsock: add netdev to vhost/virtio vsock
>   virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
>   virtio/vsock: add support for dgram
> 
> Jiang Wang (1):
>   vsock_test: add tests for vsock dgram
> 
>  drivers/vhost/vsock.c                   | 238 ++++----
>  include/linux/virtio_vsock.h            |  73 ++-
>  include/net/af_vsock.h                  |   2 +
>  include/uapi/linux/virtio_vsock.h       |   2 +
>  net/vmw_vsock/af_vsock.c                |  30 +-
>  net/vmw_vsock/hyperv_transport.c        |   2 +-
>  net/vmw_vsock/virtio_transport.c        | 237 +++++---
>  net/vmw_vsock/virtio_transport_common.c | 771 ++++++++++++++++--------
>  net/vmw_vsock/vmci_transport.c          |   9 +-
>  net/vmw_vsock/vsock_loopback.c          |  51 +-
>  tools/testing/vsock/util.c              | 105 ++++
>  tools/testing/vsock/util.h              |   4 +
>  tools/testing/vsock/vsock_test.c        | 195 ++++++
>  13 files changed, 1176 insertions(+), 543 deletions(-)
> 
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH 1/6] vsock: replace virtio_vsock_pkt with sk_buff
       [not found] ` <65d117ddc530d12a6d47fcc45b38891465a90d9f.1660362668.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:30   ` Bobby Eshleman
  0 siblings, 0 replies; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:30 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, kvm, virtualization, netdev, linux-kernel

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:04AM -0700, Bobby Eshleman wrote:
> This patch replaces virtio_vsock_pkt with sk_buff.
> 
> The benefit of this series includes:
> 
> * The bug reported @ https://bugzilla.redhat.com/show_bug.cgi?id=2009935
>   does not present itself when reasonable sk_sndbuf thresholds are set.
> * Using sock_alloc_send_skb() teaches VSOCK to respect
>   sk_sndbuf for tunability.
> * Eliminates copying for vsock_deliver_tap().
> * sk_buff is required for future improvements, such as using socket map.
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
>  drivers/vhost/vsock.c                   | 214 +++++------
>  include/linux/virtio_vsock.h            |  60 ++-
>  net/vmw_vsock/af_vsock.c                |   1 +
>  net/vmw_vsock/virtio_transport.c        | 212 +++++-----
>  net/vmw_vsock/virtio_transport_common.c | 491 ++++++++++++------------
>  net/vmw_vsock/vsock_loopback.c          |  51 +--
>  6 files changed, 517 insertions(+), 512 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 368330417bde..f8601d93d94d 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -51,8 +51,7 @@ struct vhost_vsock {
>  	struct hlist_node hash;
>  
>  	struct vhost_work send_pkt_work;
> -	spinlock_t send_pkt_list_lock;
> -	struct list_head send_pkt_list;	/* host->guest pending packets */
> +	struct sk_buff_head send_pkt_queue; /* host->guest pending packets */
>  
>  	atomic_t queued_replies;
>  
> @@ -108,7 +107,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  	vhost_disable_notify(&vsock->dev, vq);
>  
>  	do {
> -		struct virtio_vsock_pkt *pkt;
> +		struct sk_buff *skb;
> +		struct virtio_vsock_hdr *hdr;
>  		struct iov_iter iov_iter;
>  		unsigned out, in;
>  		size_t nbytes;
> @@ -116,31 +116,22 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  		int head;
>  		u32 flags_to_restore = 0;
>  
> -		spin_lock_bh(&vsock->send_pkt_list_lock);
> -		if (list_empty(&vsock->send_pkt_list)) {
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +		skb = skb_dequeue(&vsock->send_pkt_queue);
> +
> +		if (!skb) {
>  			vhost_enable_notify(&vsock->dev, vq);
>  			break;
>  		}
>  
> -		pkt = list_first_entry(&vsock->send_pkt_list,
> -				       struct virtio_vsock_pkt, list);
> -		list_del_init(&pkt->list);
> -		spin_unlock_bh(&vsock->send_pkt_list_lock);
> -
>  		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>  					 &out, &in, NULL, NULL);
>  		if (head < 0) {
> -			spin_lock_bh(&vsock->send_pkt_list_lock);
> -			list_add(&pkt->list, &vsock->send_pkt_list);
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +			skb_queue_head(&vsock->send_pkt_queue, skb);
>  			break;
>  		}
>  
>  		if (head == vq->num) {
> -			spin_lock_bh(&vsock->send_pkt_list_lock);
> -			list_add(&pkt->list, &vsock->send_pkt_list);
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +			skb_queue_head(&vsock->send_pkt_queue, skb);
>  
>  			/* We cannot finish yet if more buffers snuck in while
>  			 * re-enabling notify.
> @@ -153,26 +144,27 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  		}
>  
>  		if (out) {
> -			virtio_transport_free_pkt(pkt);
> +			kfree_skb(skb);
>  			vq_err(vq, "Expected 0 output buffers, got %u\n", out);
>  			break;
>  		}
>  
>  		iov_len = iov_length(&vq->iov[out], in);
> -		if (iov_len < sizeof(pkt->hdr)) {
> -			virtio_transport_free_pkt(pkt);
> +		if (iov_len < sizeof(*hdr)) {
> +			kfree_skb(skb);
>  			vq_err(vq, "Buffer len [%zu] too small\n", iov_len);
>  			break;
>  		}
>  
>  		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
> -		payload_len = pkt->len - pkt->off;
> +		payload_len = skb->len - vsock_metadata(skb)->off;
> +		hdr = vsock_hdr(skb);
>  
>  		/* If the packet is greater than the space available in the
>  		 * buffer, we split it using multiple buffers.
>  		 */
> -		if (payload_len > iov_len - sizeof(pkt->hdr)) {
> -			payload_len = iov_len - sizeof(pkt->hdr);
> +		if (payload_len > iov_len - sizeof(*hdr)) {
> +			payload_len = iov_len - sizeof(*hdr);
>  
>  			/* As we are copying pieces of large packet's buffer to
>  			 * small rx buffers, headers of packets in rx queue are
> @@ -185,31 +177,31 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  			 * bits set. After initialized header will be copied to
>  			 * rx buffer, these required bits will be restored.
>  			 */
> -			if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) {
> -				pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
> +			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
> +				hdr->flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>  				flags_to_restore |= VIRTIO_VSOCK_SEQ_EOM;
>  
> -				if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) {
> -					pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
> +				if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) {
> +					hdr->flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>  					flags_to_restore |= VIRTIO_VSOCK_SEQ_EOR;
>  				}
>  			}
>  		}
>  
>  		/* Set the correct length in the header */
> -		pkt->hdr.len = cpu_to_le32(payload_len);
> +		hdr->len = cpu_to_le32(payload_len);
>  
> -		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> -		if (nbytes != sizeof(pkt->hdr)) {
> -			virtio_transport_free_pkt(pkt);
> +		nbytes = copy_to_iter(hdr, sizeof(*hdr), &iov_iter);
> +		if (nbytes != sizeof(*hdr)) {
> +			kfree_skb(skb);
>  			vq_err(vq, "Faulted on copying pkt hdr\n");
>  			break;
>  		}
>  
> -		nbytes = copy_to_iter(pkt->buf + pkt->off, payload_len,
> +		nbytes = copy_to_iter(skb->data + vsock_metadata(skb)->off, payload_len,
>  				      &iov_iter);
>  		if (nbytes != payload_len) {
> -			virtio_transport_free_pkt(pkt);
> +			kfree_skb(skb);
>  			vq_err(vq, "Faulted on copying pkt buf\n");
>  			break;
>  		}
> @@ -217,31 +209,28 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  		/* Deliver to monitoring devices all packets that we
>  		 * will transmit.
>  		 */
> -		virtio_transport_deliver_tap_pkt(pkt);
> +		virtio_transport_deliver_tap_pkt(skb);
>  
> -		vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
> +		vhost_add_used(vq, head, sizeof(*hdr) + payload_len);
>  		added = true;
>  
> -		pkt->off += payload_len;
> +		vsock_metadata(skb)->off += payload_len;
>  		total_len += payload_len;
>  
>  		/* If we didn't send all the payload we can requeue the packet
>  		 * to send it with the next available buffer.
>  		 */
> -		if (pkt->off < pkt->len) {
> -			pkt->hdr.flags |= cpu_to_le32(flags_to_restore);
> +		if (vsock_metadata(skb)->off < skb->len) {
> +			hdr->flags |= cpu_to_le32(flags_to_restore);
>  
> -			/* We are queueing the same virtio_vsock_pkt to handle
> +			/* We are queueing the same skb to handle
>  			 * the remaining bytes, and we want to deliver it
>  			 * to monitoring devices in the next iteration.
>  			 */
> -			pkt->tap_delivered = false;
> -
> -			spin_lock_bh(&vsock->send_pkt_list_lock);
> -			list_add(&pkt->list, &vsock->send_pkt_list);
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +			vsock_metadata(skb)->flags &= ~VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED;
> +			skb_queue_head(&vsock->send_pkt_queue, skb);
>  		} else {
> -			if (pkt->reply) {
> +			if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY) {
>  				int val;
>  
>  				val = atomic_dec_return(&vsock->queued_replies);
> @@ -253,7 +242,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>  					restart_tx = true;
>  			}
>  
> -			virtio_transport_free_pkt(pkt);
> +			consume_skb(skb);
>  		}
>  	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
>  	if (added)
> @@ -278,28 +267,26 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work)
>  }
>  
>  static int
> -vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> +vhost_transport_send_pkt(struct sk_buff *skb)
>  {
>  	struct vhost_vsock *vsock;
> -	int len = pkt->len;
> +	int len = skb->len;
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  
>  	rcu_read_lock();
>  
>  	/* Find the vhost_vsock according to guest context id  */
> -	vsock = vhost_vsock_get(le64_to_cpu(pkt->hdr.dst_cid));
> +	vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid));
>  	if (!vsock) {
>  		rcu_read_unlock();
> -		virtio_transport_free_pkt(pkt);
> +		kfree_skb(skb);
>  		return -ENODEV;
>  	}
>  
> -	if (pkt->reply)
> +	if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY)
>  		atomic_inc(&vsock->queued_replies);
>  
> -	spin_lock_bh(&vsock->send_pkt_list_lock);
> -	list_add_tail(&pkt->list, &vsock->send_pkt_list);
> -	spin_unlock_bh(&vsock->send_pkt_list_lock);
> -
> +	skb_queue_tail(&vsock->send_pkt_queue, skb);
>  	vhost_work_queue(&vsock->dev, &vsock->send_pkt_work);
>  
>  	rcu_read_unlock();
> @@ -310,10 +297,8 @@ static int
>  vhost_transport_cancel_pkt(struct vsock_sock *vsk)
>  {
>  	struct vhost_vsock *vsock;
> -	struct virtio_vsock_pkt *pkt, *n;
>  	int cnt = 0;
>  	int ret = -ENODEV;
> -	LIST_HEAD(freeme);
>  
>  	rcu_read_lock();
>  
> @@ -322,20 +307,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
>  	if (!vsock)
>  		goto out;
>  
> -	spin_lock_bh(&vsock->send_pkt_list_lock);
> -	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> -		if (pkt->vsk != vsk)
> -			continue;
> -		list_move(&pkt->list, &freeme);
> -	}
> -	spin_unlock_bh(&vsock->send_pkt_list_lock);
> -
> -	list_for_each_entry_safe(pkt, n, &freeme, list) {
> -		if (pkt->reply)
> -			cnt++;
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> +	cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue);
>  
>  	if (cnt) {
>  		struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
> @@ -352,11 +324,12 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk)
>  	return ret;
>  }
>  
> -static struct virtio_vsock_pkt *
> -vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> +static struct sk_buff *
> +vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>  		      unsigned int out, unsigned int in)
>  {
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
> +	struct virtio_vsock_hdr *hdr;
>  	struct iov_iter iov_iter;
>  	size_t nbytes;
>  	size_t len;
> @@ -366,50 +339,49 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>  		return NULL;
>  	}
>  
> -	pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> -	if (!pkt)
> +	len = iov_length(vq->iov, out);
> +
> +	/* len contains both payload and hdr, so only add additional space for metadata */
> +	skb = alloc_skb(len + sizeof(struct virtio_vsock_metadata), GFP_KERNEL);
> +	if (!skb)
>  		return NULL;
>  
> -	len = iov_length(vq->iov, out);
> +	memset(skb->head, 0, sizeof(struct virtio_vsock_metadata));
> +	virtio_vsock_skb_reserve(skb);
>  	iov_iter_init(&iov_iter, WRITE, vq->iov, out, len);
>  
> -	nbytes = copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
> -	if (nbytes != sizeof(pkt->hdr)) {
> +	hdr = vsock_hdr(skb);
> +	nbytes = copy_from_iter(hdr, sizeof(*hdr), &iov_iter);
> +	if (nbytes != sizeof(*hdr)) {
>  		vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n",
> -		       sizeof(pkt->hdr), nbytes);
> -		kfree(pkt);
> +		       sizeof(*hdr), nbytes);
> +		kfree_skb(skb);
>  		return NULL;
>  	}
>  
> -	pkt->len = le32_to_cpu(pkt->hdr.len);
> +	len = le32_to_cpu(hdr->len);
>  
>  	/* No payload */
> -	if (!pkt->len)
> -		return pkt;
> +	if (!len)
> +		return skb;
>  
>  	/* The pkt is too big */
> -	if (pkt->len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> -		kfree(pkt);
> +	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> +		kfree_skb(skb);
>  		return NULL;
>  	}
>  
> -	pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
> -	if (!pkt->buf) {
> -		kfree(pkt);
> -		return NULL;
> -	}
> +	virtio_vsock_skb_rx_put(skb);
>  
> -	pkt->buf_len = pkt->len;
> -
> -	nbytes = copy_from_iter(pkt->buf, pkt->len, &iov_iter);
> -	if (nbytes != pkt->len) {
> -		vq_err(vq, "Expected %u byte payload, got %zu bytes\n",
> -		       pkt->len, nbytes);
> -		virtio_transport_free_pkt(pkt);
> +	nbytes = copy_from_iter(skb->data, len, &iov_iter);
> +	if (nbytes != len) {
> +		vq_err(vq, "Expected %zu byte payload, got %zu bytes\n",
> +		       len, nbytes);
> +		kfree_skb(skb);
>  		return NULL;
>  	}
>  
> -	return pkt;
> +	return skb;
>  }
>  
>  /* Is there space left for replies to rx packets? */
> @@ -496,7 +468,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
>  						  poll.work);
>  	struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
>  						 dev);
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
>  	int head, pkts = 0, total_len = 0;
>  	unsigned int out, in;
>  	bool added = false;
> @@ -511,6 +483,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
>  
>  	vhost_disable_notify(&vsock->dev, vq);
>  	do {
> +		struct virtio_vsock_hdr *hdr;
> +		u32 len;
> +
>  		if (!vhost_vsock_more_replies(vsock)) {
>  			/* Stop tx until the device processes already
>  			 * pending replies.  Leave tx virtqueue
> @@ -532,26 +507,29 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
>  			break;
>  		}
>  
> -		pkt = vhost_vsock_alloc_pkt(vq, out, in);
> -		if (!pkt) {
> -			vq_err(vq, "Faulted on pkt\n");
> +		skb = vhost_vsock_alloc_skb(vq, out, in);
> +		if (!skb)
>  			continue;
> -		}
>  
> -		total_len += sizeof(pkt->hdr) + pkt->len;
> +		len = skb->len;
>  
>  		/* Deliver to monitoring devices all received packets */
> -		virtio_transport_deliver_tap_pkt(pkt);
> +		virtio_transport_deliver_tap_pkt(skb);
> +
> +		hdr = vsock_hdr(skb);
>  
>  		/* Only accept correctly addressed packets */
> -		if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid &&
> -		    le64_to_cpu(pkt->hdr.dst_cid) ==
> +		if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid &&
> +		    le64_to_cpu(hdr->dst_cid) ==
>  		    vhost_transport_get_local_cid())
> -			virtio_transport_recv_pkt(&vhost_transport, pkt);
> +			virtio_transport_recv_pkt(&vhost_transport, skb);
>  		else
> -			virtio_transport_free_pkt(pkt);
> +			kfree_skb(skb);
> +
>  
> -		vhost_add_used(vq, head, 0);
> +		len += sizeof(*hdr);
> +		vhost_add_used(vq, head, len);
> +		total_len += len;
>  		added = true;
>  	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
>  
> @@ -693,8 +671,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file)
>  		       VHOST_VSOCK_WEIGHT, true, NULL);
>  
>  	file->private_data = vsock;
> -	spin_lock_init(&vsock->send_pkt_list_lock);
> -	INIT_LIST_HEAD(&vsock->send_pkt_list);
> +	skb_queue_head_init(&vsock->send_pkt_queue);
>  	vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work);
>  	return 0;
>  
> @@ -760,16 +737,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
>  	vhost_vsock_flush(vsock);
>  	vhost_dev_stop(&vsock->dev);
>  
> -	spin_lock_bh(&vsock->send_pkt_list_lock);
> -	while (!list_empty(&vsock->send_pkt_list)) {
> -		struct virtio_vsock_pkt *pkt;
> -
> -		pkt = list_first_entry(&vsock->send_pkt_list,
> -				struct virtio_vsock_pkt, list);
> -		list_del_init(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> -	spin_unlock_bh(&vsock->send_pkt_list_lock);
> +	skb_queue_purge(&vsock->send_pkt_queue);
>  
>  	vhost_dev_cleanup(&vsock->dev);
>  	kfree(vsock->dev.vqs);
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 35d7eedb5e8e..17ed01466875 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -4,9 +4,43 @@
>  
>  #include <uapi/linux/virtio_vsock.h>
>  #include <linux/socket.h>
> +#include <vdso/bits.h>
>  #include <net/sock.h>
>  #include <net/af_vsock.h>
>  
> +enum virtio_vsock_metadata_flags {
> +	VIRTIO_VSOCK_METADATA_FLAGS_REPLY		= BIT(0),
> +	VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED	= BIT(1),
> +};
> +
> +/* Used only by the virtio/vhost vsock drivers, not related to protocol */
> +struct virtio_vsock_metadata {
> +	size_t off;
> +	enum virtio_vsock_metadata_flags flags;
> +};
> +
> +#define vsock_hdr(skb) \
> +	((struct virtio_vsock_hdr *) \
> +	 ((void *)skb->head + sizeof(struct virtio_vsock_metadata)))
> +
> +#define vsock_metadata(skb) \
> +	((struct virtio_vsock_metadata *)skb->head)
> +
> +#define virtio_vsock_skb_reserve(skb)	\
> +	skb_reserve(skb,	\
> +		sizeof(struct virtio_vsock_metadata) + \
> +		sizeof(struct virtio_vsock_hdr))
> +
> +static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
> +{
> +	u32 len;
> +
> +	len = le32_to_cpu(vsock_hdr(skb)->len);
> +
> +	if (len > 0)
> +		skb_put(skb, len);
> +}
> +
>  #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE	(1024 * 4)
>  #define VIRTIO_VSOCK_MAX_BUF_SIZE		0xFFFFFFFFUL
>  #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE		(1024 * 64)
> @@ -35,23 +69,10 @@ struct virtio_vsock_sock {
>  	u32 last_fwd_cnt;
>  	u32 rx_bytes;
>  	u32 buf_alloc;
> -	struct list_head rx_queue;
> +	struct sk_buff_head rx_queue;
>  	u32 msg_count;
>  };
>  
> -struct virtio_vsock_pkt {
> -	struct virtio_vsock_hdr	hdr;
> -	struct list_head list;
> -	/* socket refcnt not held, only use for cancellation */
> -	struct vsock_sock *vsk;
> -	void *buf;
> -	u32 buf_len;
> -	u32 len;
> -	u32 off;
> -	bool reply;
> -	bool tap_delivered;
> -};
> -
>  struct virtio_vsock_pkt_info {
>  	u32 remote_cid, remote_port;
>  	struct vsock_sock *vsk;
> @@ -68,7 +89,7 @@ struct virtio_transport {
>  	struct vsock_transport transport;
>  
>  	/* Takes ownership of the packet */
> -	int (*send_pkt)(struct virtio_vsock_pkt *pkt);
> +	int (*send_pkt)(struct sk_buff *skb);
>  };
>  
>  ssize_t
> @@ -149,11 +170,10 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
>  void virtio_transport_destruct(struct vsock_sock *vsk);
>  
>  void virtio_transport_recv_pkt(struct virtio_transport *t,
> -			       struct virtio_vsock_pkt *pkt);
> -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt);
> -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt);
> +			       struct sk_buff *skb);
> +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb);
>  u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted);
>  void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
> -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt);
> -
> +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb);
> +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue);
>  #endif /* _LINUX_VIRTIO_VSOCK_H */
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index f04abf662ec6..e348b2d09eac 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -748,6 +748,7 @@ static struct sock *__vsock_create(struct net *net,
>  	vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
>  	vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY);
>  
> +	sk->sk_allocation = GFP_KERNEL;
>  	sk->sk_destruct = vsock_sk_destruct;
>  	sk->sk_backlog_rcv = vsock_queue_rcv_skb;
>  	sock_reset_flag(sk, SOCK_DONE);
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index ad64f403536a..3bb293fd8607 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -21,6 +21,12 @@
>  #include <linux/mutex.h>
>  #include <net/af_vsock.h>
>  
> +#define VIRTIO_VSOCK_MAX_RX_HDR_PAYLOAD_SIZE	\
> +	(VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE \
> +		 - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) \
> +		 - sizeof(struct virtio_vsock_hdr) \
> +		 - sizeof(struct virtio_vsock_metadata))
> +
>  static struct workqueue_struct *virtio_vsock_workqueue;
>  static struct virtio_vsock __rcu *the_virtio_vsock;
>  static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
> @@ -42,8 +48,7 @@ struct virtio_vsock {
>  	bool tx_run;
>  
>  	struct work_struct send_pkt_work;
> -	spinlock_t send_pkt_list_lock;
> -	struct list_head send_pkt_list;
> +	struct sk_buff_head send_pkt_queue;
>  
>  	atomic_t queued_replies;
>  
> @@ -101,41 +106,32 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>  	vq = vsock->vqs[VSOCK_VQ_TX];
>  
>  	for (;;) {
> -		struct virtio_vsock_pkt *pkt;
> +		struct sk_buff *skb;
>  		struct scatterlist hdr, buf, *sgs[2];
>  		int ret, in_sg = 0, out_sg = 0;
>  		bool reply;
>  
> -		spin_lock_bh(&vsock->send_pkt_list_lock);
> -		if (list_empty(&vsock->send_pkt_list)) {
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> -			break;
> -		}
> +		skb = skb_dequeue(&vsock->send_pkt_queue);
>  
> -		pkt = list_first_entry(&vsock->send_pkt_list,
> -				       struct virtio_vsock_pkt, list);
> -		list_del_init(&pkt->list);
> -		spin_unlock_bh(&vsock->send_pkt_list_lock);
> -
> -		virtio_transport_deliver_tap_pkt(pkt);
> +		if (!skb)
> +			break;
>  
> -		reply = pkt->reply;
> +		virtio_transport_deliver_tap_pkt(skb);
> +		reply = vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY;
>  
> -		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> +		sg_init_one(&hdr, vsock_hdr(skb), sizeof(*vsock_hdr(skb)));
>  		sgs[out_sg++] = &hdr;
> -		if (pkt->buf) {
> -			sg_init_one(&buf, pkt->buf, pkt->len);
> +		if (skb->len > 0) {
> +			sg_init_one(&buf, skb->data, skb->len);
>  			sgs[out_sg++] = &buf;
>  		}
>  
> -		ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt, GFP_KERNEL);
> +		ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, GFP_KERNEL);
>  		/* Usually this means that there is no more space available in
>  		 * the vq
>  		 */
>  		if (ret < 0) {
> -			spin_lock_bh(&vsock->send_pkt_list_lock);
> -			list_add(&pkt->list, &vsock->send_pkt_list);
> -			spin_unlock_bh(&vsock->send_pkt_list_lock);
> +			skb_queue_head(&vsock->send_pkt_queue, skb);
>  			break;
>  		}
>  
> @@ -163,33 +159,84 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>  		queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>  }
>  
> +static inline bool
> +virtio_transport_skbs_can_merge(struct sk_buff *old, struct sk_buff *new)
> +{
> +	return (new->len < GOOD_COPY_LEN &&
> +		skb_tailroom(old) >= new->len &&
> +		vsock_hdr(new)->src_cid == vsock_hdr(old)->src_cid &&
> +		vsock_hdr(new)->dst_cid == vsock_hdr(old)->dst_cid &&
> +		vsock_hdr(new)->src_port == vsock_hdr(old)->src_port &&
> +		vsock_hdr(new)->dst_port == vsock_hdr(old)->dst_port &&
> +		vsock_hdr(new)->type == vsock_hdr(old)->type &&
> +		vsock_hdr(new)->flags == vsock_hdr(old)->flags &&
> +		vsock_hdr(old)->op == VIRTIO_VSOCK_OP_RW &&
> +		vsock_hdr(new)->op == VIRTIO_VSOCK_OP_RW);
> +}
> +
> +/**
> + * Merge the two most recent skbs together if possible.
> + *
> + * Caller must hold the queue lock.
> + */
> +static void
> +virtio_transport_add_to_queue(struct sk_buff_head *queue, struct sk_buff *new)
> +{
> +	struct sk_buff *old;
> +
> +	spin_lock_bh(&queue->lock);
> +	/* In order to reduce skb memory overhead, we merge new packets with
> +	 * older packets if they pass virtio_transport_skbs_can_merge().
> +	 */
> +	if (skb_queue_empty_lockless(queue)) {
> +		__skb_queue_tail(queue, new);
> +		goto out;
> +	}
> +
> +	old = skb_peek_tail(queue);
> +
> +	if (!virtio_transport_skbs_can_merge(old, new)) {
> +		__skb_queue_tail(queue, new);
> +		goto out;
> +	}
> +
> +	memcpy(skb_put(old, new->len), new->data, new->len);
> +	vsock_hdr(old)->len = cpu_to_le32(old->len);
> +	vsock_hdr(old)->buf_alloc = vsock_hdr(new)->buf_alloc;
> +	vsock_hdr(old)->fwd_cnt = vsock_hdr(new)->fwd_cnt;
> +	dev_kfree_skb_any(new);
> +
> +out:
> +	spin_unlock_bh(&queue->lock);
> +}
> +
>  static int
> -virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> +virtio_transport_send_pkt(struct sk_buff *skb)
>  {
> +	struct virtio_vsock_hdr *hdr;
>  	struct virtio_vsock *vsock;
> -	int len = pkt->len;
> +	int len = skb->len;
> +
> +	hdr = vsock_hdr(skb);
>  
>  	rcu_read_lock();
>  	vsock = rcu_dereference(the_virtio_vsock);
>  	if (!vsock) {
> -		virtio_transport_free_pkt(pkt);
> +		kfree_skb(skb);
>  		len = -ENODEV;
>  		goto out_rcu;
>  	}
>  
> -	if (le64_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
> -		virtio_transport_free_pkt(pkt);
> +	if (le64_to_cpu(hdr->dst_cid) == vsock->guest_cid) {
> +		kfree_skb(skb);
>  		len = -ENODEV;
>  		goto out_rcu;
>  	}
>  
> -	if (pkt->reply)
> +	if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY)
>  		atomic_inc(&vsock->queued_replies);
>  
> -	spin_lock_bh(&vsock->send_pkt_list_lock);
> -	list_add_tail(&pkt->list, &vsock->send_pkt_list);
> -	spin_unlock_bh(&vsock->send_pkt_list_lock);
> -
> +	virtio_transport_add_to_queue(&vsock->send_pkt_queue, skb);
>  	queue_work(virtio_vsock_workqueue, &vsock->send_pkt_work);
>  
>  out_rcu:
> @@ -201,9 +248,7 @@ static int
>  virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>  {
>  	struct virtio_vsock *vsock;
> -	struct virtio_vsock_pkt *pkt, *n;
>  	int cnt = 0, ret;
> -	LIST_HEAD(freeme);
>  
>  	rcu_read_lock();
>  	vsock = rcu_dereference(the_virtio_vsock);
> @@ -212,20 +257,7 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>  		goto out_rcu;
>  	}
>  
> -	spin_lock_bh(&vsock->send_pkt_list_lock);
> -	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> -		if (pkt->vsk != vsk)
> -			continue;
> -		list_move(&pkt->list, &freeme);
> -	}
> -	spin_unlock_bh(&vsock->send_pkt_list_lock);
> -
> -	list_for_each_entry_safe(pkt, n, &freeme, list) {
> -		if (pkt->reply)
> -			cnt++;
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> +	cnt = virtio_transport_purge_skbs(vsk, &vsock->send_pkt_queue);
>  
>  	if (cnt) {
>  		struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
> @@ -246,38 +278,34 @@ virtio_transport_cancel_pkt(struct vsock_sock *vsk)
>  
>  static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
>  {
> -	int buf_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
> -	struct virtio_vsock_pkt *pkt;
> -	struct scatterlist hdr, buf, *sgs[2];
> +	struct scatterlist pkt, *sgs[1];
>  	struct virtqueue *vq;
>  	int ret;
>  
>  	vq = vsock->vqs[VSOCK_VQ_RX];
>  
>  	do {
> -		pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> -		if (!pkt)
> -			break;
> +		struct sk_buff *skb;
> +		const size_t len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE -
> +				SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> -		pkt->buf = kmalloc(buf_len, GFP_KERNEL);
> -		if (!pkt->buf) {
> -			virtio_transport_free_pkt(pkt);
> +		skb = alloc_skb(len, GFP_KERNEL);
> +		if (!skb)
>  			break;
> -		}
>  
> -		pkt->buf_len = buf_len;
> -		pkt->len = buf_len;
> +		memset(skb->head, 0,
> +		       sizeof(struct virtio_vsock_metadata) + sizeof(struct virtio_vsock_hdr));
>  
> -		sg_init_one(&hdr, &pkt->hdr, sizeof(pkt->hdr));
> -		sgs[0] = &hdr;
> +		sg_init_one(&pkt, skb->head + sizeof(struct virtio_vsock_metadata),
> +			    VIRTIO_VSOCK_MAX_RX_HDR_PAYLOAD_SIZE);
> +		sgs[0] = &pkt;
>  
> -		sg_init_one(&buf, pkt->buf, buf_len);
> -		sgs[1] = &buf;
> -		ret = virtqueue_add_sgs(vq, sgs, 0, 2, pkt, GFP_KERNEL);
> -		if (ret) {
> -			virtio_transport_free_pkt(pkt);
> +		ret = virtqueue_add_sgs(vq, sgs, 0, 1, skb, GFP_KERNEL);
> +		if (ret < 0) {
> +			kfree_skb(skb);
>  			break;
>  		}
> +
>  		vsock->rx_buf_nr++;
>  	} while (vq->num_free);
>  	if (vsock->rx_buf_nr > vsock->rx_buf_max_nr)
> @@ -299,12 +327,12 @@ static void virtio_transport_tx_work(struct work_struct *work)
>  		goto out;
>  
>  	do {
> -		struct virtio_vsock_pkt *pkt;
> +		struct sk_buff *skb;
>  		unsigned int len;
>  
>  		virtqueue_disable_cb(vq);
> -		while ((pkt = virtqueue_get_buf(vq, &len)) != NULL) {
> -			virtio_transport_free_pkt(pkt);
> +		while ((skb = virtqueue_get_buf(vq, &len)) != NULL) {
> +			consume_skb(skb);
>  			added = true;
>  		}
>  	} while (!virtqueue_enable_cb(vq));
> @@ -529,7 +557,8 @@ static void virtio_transport_rx_work(struct work_struct *work)
>  	do {
>  		virtqueue_disable_cb(vq);
>  		for (;;) {
> -			struct virtio_vsock_pkt *pkt;
> +			struct virtio_vsock_hdr *hdr;
> +			struct sk_buff *skb;
>  			unsigned int len;
>  
>  			if (!virtio_transport_more_replies(vsock)) {
> @@ -540,23 +569,24 @@ static void virtio_transport_rx_work(struct work_struct *work)
>  				goto out;
>  			}
>  
> -			pkt = virtqueue_get_buf(vq, &len);
> -			if (!pkt) {
> +			skb = virtqueue_get_buf(vq, &len);
> +			if (!skb)
>  				break;
> -			}
>  
>  			vsock->rx_buf_nr--;
>  
>  			/* Drop short/long packets */
> -			if (unlikely(len < sizeof(pkt->hdr) ||
> -				     len > sizeof(pkt->hdr) + pkt->len)) {
> -				virtio_transport_free_pkt(pkt);
> +			if (unlikely(len < sizeof(*hdr) ||
> +				     len > VIRTIO_VSOCK_MAX_RX_HDR_PAYLOAD_SIZE)) {
> +				kfree_skb(skb);
>  				continue;
>  			}
>  
> -			pkt->len = len - sizeof(pkt->hdr);
> -			virtio_transport_deliver_tap_pkt(pkt);
> -			virtio_transport_recv_pkt(&virtio_transport, pkt);
> +			hdr = vsock_hdr(skb);
> +			virtio_vsock_skb_reserve(skb);
> +			virtio_vsock_skb_rx_put(skb);
> +			virtio_transport_deliver_tap_pkt(skb);
> +			virtio_transport_recv_pkt(&virtio_transport, skb);
>  		}
>  	} while (!virtqueue_enable_cb(vq));
>  
> @@ -610,7 +640,7 @@ static int virtio_vsock_vqs_init(struct virtio_vsock *vsock)
>  static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
>  {
>  	struct virtio_device *vdev = vsock->vdev;
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
>  
>  	/* Reset all connected sockets when the VQs disappear */
>  	vsock_for_each_connected_socket(&virtio_transport.transport,
> @@ -637,23 +667,16 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
>  	virtio_reset_device(vdev);
>  
>  	mutex_lock(&vsock->rx_lock);
> -	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX])))
> -		virtio_transport_free_pkt(pkt);
> +	while ((skb = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_RX])))
> +		kfree_skb(skb);
>  	mutex_unlock(&vsock->rx_lock);
>  
>  	mutex_lock(&vsock->tx_lock);
> -	while ((pkt = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
> -		virtio_transport_free_pkt(pkt);
> +	while ((skb = virtqueue_detach_unused_buf(vsock->vqs[VSOCK_VQ_TX])))
> +		kfree_skb(skb);
>  	mutex_unlock(&vsock->tx_lock);
>  
> -	spin_lock_bh(&vsock->send_pkt_list_lock);
> -	while (!list_empty(&vsock->send_pkt_list)) {
> -		pkt = list_first_entry(&vsock->send_pkt_list,
> -				       struct virtio_vsock_pkt, list);
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> -	spin_unlock_bh(&vsock->send_pkt_list_lock);
> +	skb_queue_purge(&vsock->send_pkt_queue);
>  
>  	/* Delete virtqueues and flush outstanding callbacks if any */
>  	vdev->config->del_vqs(vdev);
> @@ -690,8 +713,7 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>  	mutex_init(&vsock->tx_lock);
>  	mutex_init(&vsock->rx_lock);
>  	mutex_init(&vsock->event_lock);
> -	spin_lock_init(&vsock->send_pkt_list_lock);
> -	INIT_LIST_HEAD(&vsock->send_pkt_list);
> +	skb_queue_head_init(&vsock->send_pkt_queue);
>  	INIT_WORK(&vsock->rx_work, virtio_transport_rx_work);
>  	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
>  	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index ec2c2afbf0d0..920578597bb9 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -37,53 +37,81 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
>  	return container_of(t, struct virtio_transport, transport);
>  }
>  
> -static struct virtio_vsock_pkt *
> -virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
> +/* Returns a new packet on success, otherwise returns NULL.
> + *
> + * If NULL is returned, errp is set to a negative errno.
> + */
> +static struct sk_buff *
> +virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
>  			   size_t len,
>  			   u32 src_cid,
>  			   u32 src_port,
>  			   u32 dst_cid,
> -			   u32 dst_port)
> +			   u32 dst_port,
> +			   int *errp)
>  {
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
> +	struct virtio_vsock_hdr *hdr;
> +	void *payload;
> +	const size_t skb_len = sizeof(*hdr) + sizeof(struct virtio_vsock_metadata) + len;
>  	int err;
>  
> -	pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
> -	if (!pkt)
> -		return NULL;
> +	if (info->vsk) {
> +		unsigned int msg_flags = info->msg ? info->msg->msg_flags : 0;
> +		struct sock *sk;
>  
> -	pkt->hdr.type		= cpu_to_le16(info->type);
> -	pkt->hdr.op		= cpu_to_le16(info->op);
> -	pkt->hdr.src_cid	= cpu_to_le64(src_cid);
> -	pkt->hdr.dst_cid	= cpu_to_le64(dst_cid);
> -	pkt->hdr.src_port	= cpu_to_le32(src_port);
> -	pkt->hdr.dst_port	= cpu_to_le32(dst_port);
> -	pkt->hdr.flags		= cpu_to_le32(info->flags);
> -	pkt->len		= len;
> -	pkt->hdr.len		= cpu_to_le32(len);
> -	pkt->reply		= info->reply;
> -	pkt->vsk		= info->vsk;
> +		sk = sk_vsock(info->vsk);
> +		skb = sock_alloc_send_skb(sk, skb_len,
> +					  msg_flags & MSG_DONTWAIT, errp);
>  
> -	if (info->msg && len > 0) {
> -		pkt->buf = kmalloc(len, GFP_KERNEL);
> -		if (!pkt->buf)
> -			goto out_pkt;
> +		if (skb)
> +			skb->priority = sk->sk_priority;
> +	} else {
> +		skb = alloc_skb(skb_len, GFP_KERNEL);
> +	}
> +
> +	if (!skb) {
> +		/* If using alloc_skb(), the skb is NULL due to lacking memory.
> +		 * Otherwise, errp is set by sock_alloc_send_skb().
> +		 */
> +		if (!info->vsk)
> +			*errp = -ENOMEM;
> +		return NULL;
> +	}
>  
> -		pkt->buf_len = len;
> +	memset(skb->head, 0, sizeof(*hdr) + sizeof(struct virtio_vsock_metadata));
> +	virtio_vsock_skb_reserve(skb);
> +	payload = skb_put(skb, len);
>  
> -		err = memcpy_from_msg(pkt->buf, info->msg, len);
> -		if (err)
> +	hdr = vsock_hdr(skb);
> +	hdr->type	= cpu_to_le16(info->type);
> +	hdr->op		= cpu_to_le16(info->op);
> +	hdr->src_cid	= cpu_to_le64(src_cid);
> +	hdr->dst_cid	= cpu_to_le64(dst_cid);
> +	hdr->src_port	= cpu_to_le32(src_port);
> +	hdr->dst_port	= cpu_to_le32(dst_port);
> +	hdr->flags	= cpu_to_le32(info->flags);
> +	hdr->len	= cpu_to_le32(len);
> +
> +	if (info->msg && len > 0) {
> +		err = memcpy_from_msg(payload, info->msg, len);
> +		if (err) {
> +			*errp = -ENOMEM;
>  			goto out;
> +		}
>  
>  		if (msg_data_left(info->msg) == 0 &&
>  		    info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) {
> -			pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
> +			hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM);
>  
>  			if (info->msg->msg_flags & MSG_EOR)
> -				pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
> +				hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR);
>  		}
>  	}
>  
> +	if (info->reply)
> +		vsock_metadata(skb)->flags |= VIRTIO_VSOCK_METADATA_FLAGS_REPLY;
> +
>  	trace_virtio_transport_alloc_pkt(src_cid, src_port,
>  					 dst_cid, dst_port,
>  					 len,
> @@ -91,85 +119,26 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info,
>  					 info->op,
>  					 info->flags);
>  
> -	return pkt;
> +	return skb;
>  
>  out:
> -	kfree(pkt->buf);
> -out_pkt:
> -	kfree(pkt);
> +	kfree_skb(skb);
>  	return NULL;
>  }
>  
>  /* Packet capture */
>  static struct sk_buff *virtio_transport_build_skb(void *opaque)
>  {
> -	struct virtio_vsock_pkt *pkt = opaque;
> -	struct af_vsockmon_hdr *hdr;
> -	struct sk_buff *skb;
> -	size_t payload_len;
> -	void *payload_buf;
> -
> -	/* A packet could be split to fit the RX buffer, so we can retrieve
> -	 * the payload length from the header and the buffer pointer taking
> -	 * care of the offset in the original packet.
> -	 */
> -	payload_len = le32_to_cpu(pkt->hdr.len);
> -	payload_buf = pkt->buf + pkt->off;
> -
> -	skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + payload_len,
> -			GFP_ATOMIC);
> -	if (!skb)
> -		return NULL;
> -
> -	hdr = skb_put(skb, sizeof(*hdr));
> -
> -	/* pkt->hdr is little-endian so no need to byteswap here */
> -	hdr->src_cid = pkt->hdr.src_cid;
> -	hdr->src_port = pkt->hdr.src_port;
> -	hdr->dst_cid = pkt->hdr.dst_cid;
> -	hdr->dst_port = pkt->hdr.dst_port;
> -
> -	hdr->transport = cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO);
> -	hdr->len = cpu_to_le16(sizeof(pkt->hdr));
> -	memset(hdr->reserved, 0, sizeof(hdr->reserved));
> -
> -	switch (le16_to_cpu(pkt->hdr.op)) {
> -	case VIRTIO_VSOCK_OP_REQUEST:
> -	case VIRTIO_VSOCK_OP_RESPONSE:
> -		hdr->op = cpu_to_le16(AF_VSOCK_OP_CONNECT);
> -		break;
> -	case VIRTIO_VSOCK_OP_RST:
> -	case VIRTIO_VSOCK_OP_SHUTDOWN:
> -		hdr->op = cpu_to_le16(AF_VSOCK_OP_DISCONNECT);
> -		break;
> -	case VIRTIO_VSOCK_OP_RW:
> -		hdr->op = cpu_to_le16(AF_VSOCK_OP_PAYLOAD);
> -		break;
> -	case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
> -	case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
> -		hdr->op = cpu_to_le16(AF_VSOCK_OP_CONTROL);
> -		break;
> -	default:
> -		hdr->op = cpu_to_le16(AF_VSOCK_OP_UNKNOWN);
> -		break;
> -	}
> -
> -	skb_put_data(skb, &pkt->hdr, sizeof(pkt->hdr));
> -
> -	if (payload_len) {
> -		skb_put_data(skb, payload_buf, payload_len);
> -	}
> -
> -	return skb;
> +	return (struct sk_buff *)opaque;
>  }
>  
> -void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
> +void virtio_transport_deliver_tap_pkt(struct sk_buff *skb)
>  {
> -	if (pkt->tap_delivered)
> +	if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED)
>  		return;
>  
> -	vsock_deliver_tap(virtio_transport_build_skb, pkt);
> -	pkt->tap_delivered = true;
> +	vsock_deliver_tap(virtio_transport_build_skb, skb);
> +	vsock_metadata(skb)->flags |= VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED;
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>  
> @@ -192,8 +161,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	u32 src_cid, src_port, dst_cid, dst_port;
>  	const struct virtio_transport *t_ops;
>  	struct virtio_vsock_sock *vvs;
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
>  	u32 pkt_len = info->pkt_len;
> +	int err;
>  
>  	info->type = virtio_transport_get_type(sk_vsock(vsk));
>  
> @@ -224,42 +194,47 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>  		return pkt_len;
>  
> -	pkt = virtio_transport_alloc_pkt(info, pkt_len,
> +	skb = virtio_transport_alloc_skb(info, pkt_len,
>  					 src_cid, src_port,
> -					 dst_cid, dst_port);
> -	if (!pkt) {
> +					 dst_cid, dst_port,
> +					 &err);
> +	if (!skb) {
>  		virtio_transport_put_credit(vvs, pkt_len);
> -		return -ENOMEM;
> +		return err;
>  	}
>  
> -	virtio_transport_inc_tx_pkt(vvs, pkt);
> +	virtio_transport_inc_tx_pkt(vvs, skb);
> +
> +	err = t_ops->send_pkt(skb);
>  
> -	return t_ops->send_pkt(pkt);
> +	return err < 0 ? -ENOMEM : err;
>  }
>  
>  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +					struct sk_buff *skb)
>  {
> -	if (vvs->rx_bytes + pkt->len > vvs->buf_alloc)
> +	if (vvs->rx_bytes + skb->len > vvs->buf_alloc)
>  		return false;
>  
> -	vvs->rx_bytes += pkt->len;
> +	vvs->rx_bytes += skb->len;
>  	return true;
>  }
>  
>  static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
> -					struct virtio_vsock_pkt *pkt)
> +					struct sk_buff *skb)
>  {
> -	vvs->rx_bytes -= pkt->len;
> -	vvs->fwd_cnt += pkt->len;
> +	vvs->rx_bytes -= skb->len;
> +	vvs->fwd_cnt += skb->len;
>  }
>  
> -void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt)
> +void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb)
>  {
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> +
>  	spin_lock_bh(&vvs->rx_lock);
>  	vvs->last_fwd_cnt = vvs->fwd_cnt;
> -	pkt->hdr.fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> -	pkt->hdr.buf_alloc = cpu_to_le32(vvs->buf_alloc);
> +	hdr->fwd_cnt = cpu_to_le32(vvs->fwd_cnt);
> +	hdr->buf_alloc = cpu_to_le32(vvs->buf_alloc);
>  	spin_unlock_bh(&vvs->rx_lock);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_inc_tx_pkt);
> @@ -303,29 +278,29 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
>  				size_t len)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb, *tmp;
>  	size_t bytes, total = 0, off;
>  	int err = -EFAULT;
>  
>  	spin_lock_bh(&vvs->rx_lock);
>  
> -	list_for_each_entry(pkt, &vvs->rx_queue, list) {
> -		off = pkt->off;
> +	skb_queue_walk_safe(&vvs->rx_queue, skb,  tmp) {
> +		off = vsock_metadata(skb)->off;
>  
>  		if (total == len)
>  			break;
>  
> -		while (total < len && off < pkt->len) {
> +		while (total < len && off < skb->len) {
>  			bytes = len - total;
> -			if (bytes > pkt->len - off)
> -				bytes = pkt->len - off;
> +			if (bytes > skb->len - off)
> +				bytes = skb->len - off;
>  
>  			/* sk_lock is held by caller so no one else can dequeue.
>  			 * Unlock rx_lock since memcpy_to_msg() may sleep.
>  			 */
>  			spin_unlock_bh(&vvs->rx_lock);
>  
> -			err = memcpy_to_msg(msg, pkt->buf + off, bytes);
> +			err = memcpy_to_msg(msg, skb->data + off, bytes);
>  			if (err)
>  				goto out;
>  
> @@ -352,37 +327,40 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
>  				   size_t len)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
>  	size_t bytes, total = 0;
>  	u32 free_space;
>  	int err = -EFAULT;
>  
>  	spin_lock_bh(&vvs->rx_lock);
> -	while (total < len && !list_empty(&vvs->rx_queue)) {
> -		pkt = list_first_entry(&vvs->rx_queue,
> -				       struct virtio_vsock_pkt, list);
> +	while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> +		skb = __skb_dequeue(&vvs->rx_queue);
>  
>  		bytes = len - total;
> -		if (bytes > pkt->len - pkt->off)
> -			bytes = pkt->len - pkt->off;
> +		if (bytes > skb->len - vsock_metadata(skb)->off)
> +			bytes = skb->len - vsock_metadata(skb)->off;
>  
>  		/* sk_lock is held by caller so no one else can dequeue.
>  		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>  		 */
>  		spin_unlock_bh(&vvs->rx_lock);
>  
> -		err = memcpy_to_msg(msg, pkt->buf + pkt->off, bytes);
> +		err = memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, bytes);
>  		if (err)
>  			goto out;
>  
>  		spin_lock_bh(&vvs->rx_lock);
>  
>  		total += bytes;
> -		pkt->off += bytes;
> -		if (pkt->off == pkt->len) {
> -			virtio_transport_dec_rx_pkt(vvs, pkt);
> -			list_del(&pkt->list);
> -			virtio_transport_free_pkt(pkt);
> +		vsock_metadata(skb)->off += bytes;
> +
> +		WARN_ON(vsock_metadata(skb)->off > skb->len);
> +
> +		if (vsock_metadata(skb)->off == skb->len) {
> +			virtio_transport_dec_rx_pkt(vvs, skb);
> +			consume_skb(skb);
> +		} else {
> +			__skb_queue_head(&vvs->rx_queue, skb);
>  		}
>  	}
>  
> @@ -414,7 +392,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
>  						 int flags)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt;
> +	struct sk_buff *skb;
>  	int dequeued_len = 0;
>  	size_t user_buf_len = msg_data_left(msg);
>  	bool msg_ready = false;
> @@ -427,13 +405,16 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
>  	}
>  
>  	while (!msg_ready) {
> -		pkt = list_first_entry(&vvs->rx_queue, struct virtio_vsock_pkt, list);
> +		struct virtio_vsock_hdr *hdr;
> +
> +		skb = __skb_dequeue(&vvs->rx_queue);
> +		hdr = vsock_hdr(skb);
>  
>  		if (dequeued_len >= 0) {
>  			size_t pkt_len;
>  			size_t bytes_to_copy;
>  
> -			pkt_len = (size_t)le32_to_cpu(pkt->hdr.len);
> +			pkt_len = (size_t)le32_to_cpu(hdr->len);
>  			bytes_to_copy = min(user_buf_len, pkt_len);
>  
>  			if (bytes_to_copy) {
> @@ -444,7 +425,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
>  				 */
>  				spin_unlock_bh(&vvs->rx_lock);
>  
> -				err = memcpy_to_msg(msg, pkt->buf, bytes_to_copy);
> +				err = memcpy_to_msg(msg, skb->data, bytes_to_copy);
>  				if (err) {
>  					/* Copy of message failed. Rest of
>  					 * fragments will be freed without copy.
> @@ -461,17 +442,16 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
>  				dequeued_len += pkt_len;
>  		}
>  
> -		if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) {
> +		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
>  			msg_ready = true;
>  			vvs->msg_count--;
>  
> -			if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR)
> +			if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
>  				msg->msg_flags |= MSG_EOR;
>  		}
>  
> -		virtio_transport_dec_rx_pkt(vvs, pkt);
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> +		virtio_transport_dec_rx_pkt(vvs, skb);
> +		kfree_skb(skb);
>  	}
>  
>  	spin_unlock_bh(&vvs->rx_lock);
> @@ -609,7 +589,7 @@ int virtio_transport_do_socket_init(struct vsock_sock *vsk,
>  
>  	spin_lock_init(&vvs->rx_lock);
>  	spin_lock_init(&vvs->tx_lock);
> -	INIT_LIST_HEAD(&vvs->rx_queue);
> +	skb_queue_head_init(&vvs->rx_queue);
>  
>  	return 0;
>  }
> @@ -809,16 +789,16 @@ void virtio_transport_destruct(struct vsock_sock *vsk)
>  EXPORT_SYMBOL_GPL(virtio_transport_destruct);
>  
>  static int virtio_transport_reset(struct vsock_sock *vsk,
> -				  struct virtio_vsock_pkt *pkt)
> +				  struct sk_buff *skb)
>  {
>  	struct virtio_vsock_pkt_info info = {
>  		.op = VIRTIO_VSOCK_OP_RST,
> -		.reply = !!pkt,
> +		.reply = !!skb,
>  		.vsk = vsk,
>  	};
>  
>  	/* Send RST only if the original pkt is not a RST pkt */
> -	if (pkt && le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)
> +	if (skb && le16_to_cpu(vsock_hdr(skb)->op) == VIRTIO_VSOCK_OP_RST)
>  		return 0;
>  
>  	return virtio_transport_send_pkt_info(vsk, &info);
> @@ -828,29 +808,32 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
>   * attempt was made to connect to a socket that does not exist.
>   */
>  static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> -					  struct virtio_vsock_pkt *pkt)
> +					  struct sk_buff *skb)
>  {
> -	struct virtio_vsock_pkt *reply;
> +	struct sk_buff *reply;
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	struct virtio_vsock_pkt_info info = {
>  		.op = VIRTIO_VSOCK_OP_RST,
> -		.type = le16_to_cpu(pkt->hdr.type),
> +		.type = le16_to_cpu(hdr->type),
>  		.reply = true,
>  	};
> +	int err;
>  
>  	/* Send RST only if the original pkt is not a RST pkt */
> -	if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)
> +	if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST)
>  		return 0;
>  
> -	reply = virtio_transport_alloc_pkt(&info, 0,
> -					   le64_to_cpu(pkt->hdr.dst_cid),
> -					   le32_to_cpu(pkt->hdr.dst_port),
> -					   le64_to_cpu(pkt->hdr.src_cid),
> -					   le32_to_cpu(pkt->hdr.src_port));
> +	reply = virtio_transport_alloc_skb(&info, 0,
> +					   le64_to_cpu(hdr->dst_cid),
> +					   le32_to_cpu(hdr->dst_port),
> +					   le64_to_cpu(hdr->src_cid),
> +					   le32_to_cpu(hdr->src_port),
> +					   &err);
>  	if (!reply)
> -		return -ENOMEM;
> +		return err;
>  
>  	if (!t) {
> -		virtio_transport_free_pkt(reply);
> +		kfree_skb(reply);
>  		return -ENOTCONN;
>  	}
>  
> @@ -861,16 +844,11 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
>  static void virtio_transport_remove_sock(struct vsock_sock *vsk)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> -	struct virtio_vsock_pkt *pkt, *tmp;
>  
>  	/* We don't need to take rx_lock, as the socket is closing and we are
>  	 * removing it.
>  	 */
> -	list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) {
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> -
> +	__skb_queue_purge(&vvs->rx_queue);
>  	vsock_remove_sock(vsk);
>  }
>  
> @@ -984,13 +962,14 @@ EXPORT_SYMBOL_GPL(virtio_transport_release);
>  
>  static int
>  virtio_transport_recv_connecting(struct sock *sk,
> -				 struct virtio_vsock_pkt *pkt)
> +				 struct sk_buff *skb)
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	int err;
>  	int skerr;
>  
> -	switch (le16_to_cpu(pkt->hdr.op)) {
> +	switch (le16_to_cpu(hdr->op)) {
>  	case VIRTIO_VSOCK_OP_RESPONSE:
>  		sk->sk_state = TCP_ESTABLISHED;
>  		sk->sk_socket->state = SS_CONNECTED;
> @@ -1011,7 +990,7 @@ virtio_transport_recv_connecting(struct sock *sk,
>  	return 0;
>  
>  destroy:
> -	virtio_transport_reset(vsk, pkt);
> +	virtio_transport_reset(vsk, skb);
>  	sk->sk_state = TCP_CLOSE;
>  	sk->sk_err = skerr;
>  	sk_error_report(sk);
> @@ -1020,34 +999,38 @@ virtio_transport_recv_connecting(struct sock *sk,
>  
>  static void
>  virtio_transport_recv_enqueue(struct vsock_sock *vsk,
> -			      struct virtio_vsock_pkt *pkt)
> +			      struct sk_buff *skb)
>  {
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_hdr *hdr;
>  	bool can_enqueue, free_pkt = false;
> +	u32 len;
>  
> -	pkt->len = le32_to_cpu(pkt->hdr.len);
> -	pkt->off = 0;
> +	hdr = vsock_hdr(skb);
> +	len = le32_to_cpu(hdr->len);
> +	vsock_metadata(skb)->off = 0;
>  
>  	spin_lock_bh(&vvs->rx_lock);
>  
> -	can_enqueue = virtio_transport_inc_rx_pkt(vvs, pkt);
> +	can_enqueue = virtio_transport_inc_rx_pkt(vvs, skb);
>  	if (!can_enqueue) {
>  		free_pkt = true;
>  		goto out;
>  	}
>  
> -	if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM)
> +	if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)
>  		vvs->msg_count++;
>  
>  	/* Try to copy small packets into the buffer of last packet queued,
>  	 * to avoid wasting memory queueing the entire buffer with a small
>  	 * payload.
>  	 */
> -	if (pkt->len <= GOOD_COPY_LEN && !list_empty(&vvs->rx_queue)) {
> -		struct virtio_vsock_pkt *last_pkt;
> +	if (len <= GOOD_COPY_LEN && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> +		struct virtio_vsock_hdr *last_hdr;
> +		struct sk_buff *last_skb;
>  
> -		last_pkt = list_last_entry(&vvs->rx_queue,
> -					   struct virtio_vsock_pkt, list);
> +		last_skb = skb_peek_tail(&vvs->rx_queue);
> +		last_hdr = vsock_hdr(last_skb);
>  
>  		/* If there is space in the last packet queued, we copy the
>  		 * new packet in its buffer. We avoid this if the last packet
> @@ -1055,35 +1038,35 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
>  		 * delimiter of SEQPACKET message, so 'pkt' is the first packet
>  		 * of a new message.
>  		 */
> -		if ((pkt->len <= last_pkt->buf_len - last_pkt->len) &&
> -		    !(le32_to_cpu(last_pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM)) {
> -			memcpy(last_pkt->buf + last_pkt->len, pkt->buf,
> -			       pkt->len);
> -			last_pkt->len += pkt->len;
> +		if (skb->len < skb_tailroom(last_skb) &&
> +		    !(le32_to_cpu(last_hdr->flags) & VIRTIO_VSOCK_SEQ_EOR) &&
> +		    (vsock_hdr(skb)->type != VIRTIO_VSOCK_TYPE_DGRAM)) {
> +			memcpy(skb_put(last_skb, skb->len), skb->data, skb->len);
>  			free_pkt = true;
> -			last_pkt->hdr.flags |= pkt->hdr.flags;
> +			last_hdr->flags |= hdr->flags;
>  			goto out;
>  		}
>  	}
>  
> -	list_add_tail(&pkt->list, &vvs->rx_queue);
> +	__skb_queue_tail(&vvs->rx_queue, skb);
>  
>  out:
>  	spin_unlock_bh(&vvs->rx_lock);
>  	if (free_pkt)
> -		virtio_transport_free_pkt(pkt);
> +		kfree_skb(skb);
>  }
>  
>  static int
>  virtio_transport_recv_connected(struct sock *sk,
> -				struct virtio_vsock_pkt *pkt)
> +				struct sk_buff *skb)
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	int err = 0;
>  
> -	switch (le16_to_cpu(pkt->hdr.op)) {
> +	switch (le16_to_cpu(hdr->op)) {
>  	case VIRTIO_VSOCK_OP_RW:
> -		virtio_transport_recv_enqueue(vsk, pkt);
> +		virtio_transport_recv_enqueue(vsk, skb);
>  		sk->sk_data_ready(sk);
>  		return err;
>  	case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
> @@ -1093,18 +1076,17 @@ virtio_transport_recv_connected(struct sock *sk,
>  		sk->sk_write_space(sk);
>  		break;
>  	case VIRTIO_VSOCK_OP_SHUTDOWN:
> -		if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_RCV)
> +		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_RCV)
>  			vsk->peer_shutdown |= RCV_SHUTDOWN;
> -		if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SHUTDOWN_SEND)
> +		if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SHUTDOWN_SEND)
>  			vsk->peer_shutdown |= SEND_SHUTDOWN;
>  		if (vsk->peer_shutdown == SHUTDOWN_MASK &&
>  		    vsock_stream_has_data(vsk) <= 0 &&
>  		    !sock_flag(sk, SOCK_DONE)) {
>  			(void)virtio_transport_reset(vsk, NULL);
> -
>  			virtio_transport_do_close(vsk, true);
>  		}
> -		if (le32_to_cpu(pkt->hdr.flags))
> +		if (le32_to_cpu(vsock_hdr(skb)->flags))
>  			sk->sk_state_change(sk);
>  		break;
>  	case VIRTIO_VSOCK_OP_RST:
> @@ -1115,28 +1097,30 @@ virtio_transport_recv_connected(struct sock *sk,
>  		break;
>  	}
>  
> -	virtio_transport_free_pkt(pkt);
> +	kfree_skb(skb);
>  	return err;
>  }
>  
>  static void
>  virtio_transport_recv_disconnecting(struct sock *sk,
> -				    struct virtio_vsock_pkt *pkt)
> +				    struct sk_buff *skb)
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  
> -	if (le16_to_cpu(pkt->hdr.op) == VIRTIO_VSOCK_OP_RST)
> +	if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST)
>  		virtio_transport_do_close(vsk, true);
>  }
>  
>  static int
>  virtio_transport_send_response(struct vsock_sock *vsk,
> -			       struct virtio_vsock_pkt *pkt)
> +			       struct sk_buff *skb)
>  {
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	struct virtio_vsock_pkt_info info = {
>  		.op = VIRTIO_VSOCK_OP_RESPONSE,
> -		.remote_cid = le64_to_cpu(pkt->hdr.src_cid),
> -		.remote_port = le32_to_cpu(pkt->hdr.src_port),
> +		.remote_cid = le64_to_cpu(hdr->src_cid),
> +		.remote_port = le32_to_cpu(hdr->src_port),
>  		.reply = true,
>  		.vsk = vsk,
>  	};
> @@ -1145,10 +1129,11 @@ virtio_transport_send_response(struct vsock_sock *vsk,
>  }
>  
>  static bool virtio_transport_space_update(struct sock *sk,
> -					  struct virtio_vsock_pkt *pkt)
> +					  struct sk_buff *skb)
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
>  	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	bool space_available;
>  
>  	/* Listener sockets are not associated with any transport, so we are
> @@ -1161,8 +1146,8 @@ static bool virtio_transport_space_update(struct sock *sk,
>  
>  	/* buf_alloc and fwd_cnt is always included in the hdr */
>  	spin_lock_bh(&vvs->tx_lock);
> -	vvs->peer_buf_alloc = le32_to_cpu(pkt->hdr.buf_alloc);
> -	vvs->peer_fwd_cnt = le32_to_cpu(pkt->hdr.fwd_cnt);
> +	vvs->peer_buf_alloc = le32_to_cpu(hdr->buf_alloc);
> +	vvs->peer_fwd_cnt = le32_to_cpu(hdr->fwd_cnt);
>  	space_available = virtio_transport_has_space(vsk);
>  	spin_unlock_bh(&vvs->tx_lock);
>  	return space_available;
> @@ -1170,27 +1155,28 @@ static bool virtio_transport_space_update(struct sock *sk,
>  
>  /* Handle server socket */
>  static int
> -virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
> +virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  			     struct virtio_transport *t)
>  {
>  	struct vsock_sock *vsk = vsock_sk(sk);
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	struct vsock_sock *vchild;
>  	struct sock *child;
>  	int ret;
>  
> -	if (le16_to_cpu(pkt->hdr.op) != VIRTIO_VSOCK_OP_REQUEST) {
> -		virtio_transport_reset_no_sock(t, pkt);
> +	if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) {
> +		virtio_transport_reset_no_sock(t, skb);
>  		return -EINVAL;
>  	}
>  
>  	if (sk_acceptq_is_full(sk)) {
> -		virtio_transport_reset_no_sock(t, pkt);
> +		virtio_transport_reset_no_sock(t, skb);
>  		return -ENOMEM;
>  	}
>  
>  	child = vsock_create_connected(sk);
>  	if (!child) {
> -		virtio_transport_reset_no_sock(t, pkt);
> +		virtio_transport_reset_no_sock(t, skb);
>  		return -ENOMEM;
>  	}
>  
> @@ -1201,10 +1187,10 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
>  	child->sk_state = TCP_ESTABLISHED;
>  
>  	vchild = vsock_sk(child);
> -	vsock_addr_init(&vchild->local_addr, le64_to_cpu(pkt->hdr.dst_cid),
> -			le32_to_cpu(pkt->hdr.dst_port));
> -	vsock_addr_init(&vchild->remote_addr, le64_to_cpu(pkt->hdr.src_cid),
> -			le32_to_cpu(pkt->hdr.src_port));
> +	vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid),
> +			le32_to_cpu(hdr->dst_port));
> +	vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid),
> +			le32_to_cpu(hdr->src_port));
>  
>  	ret = vsock_assign_transport(vchild, vsk);
>  	/* Transport assigned (looking at remote_addr) must be the same
> @@ -1212,17 +1198,17 @@ virtio_transport_recv_listen(struct sock *sk, struct virtio_vsock_pkt *pkt,
>  	 */
>  	if (ret || vchild->transport != &t->transport) {
>  		release_sock(child);
> -		virtio_transport_reset_no_sock(t, pkt);
> +		virtio_transport_reset_no_sock(t, skb);
>  		sock_put(child);
>  		return ret;
>  	}
>  
> -	if (virtio_transport_space_update(child, pkt))
> +	if (virtio_transport_space_update(child, skb))
>  		child->sk_write_space(child);
>  
>  	vsock_insert_connected(vchild);
>  	vsock_enqueue_accept(sk, child);
> -	virtio_transport_send_response(vchild, pkt);
> +	virtio_transport_send_response(vchild, skb);
>  
>  	release_sock(child);
>  
> @@ -1240,29 +1226,30 @@ static bool virtio_transport_valid_type(u16 type)
>   * lock.
>   */
>  void virtio_transport_recv_pkt(struct virtio_transport *t,
> -			       struct virtio_vsock_pkt *pkt)
> +			       struct sk_buff *skb)
>  {
>  	struct sockaddr_vm src, dst;
>  	struct vsock_sock *vsk;
>  	struct sock *sk;
>  	bool space_available;
> +	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  
> -	vsock_addr_init(&src, le64_to_cpu(pkt->hdr.src_cid),
> -			le32_to_cpu(pkt->hdr.src_port));
> -	vsock_addr_init(&dst, le64_to_cpu(pkt->hdr.dst_cid),
> -			le32_to_cpu(pkt->hdr.dst_port));
> +	vsock_addr_init(&src, le64_to_cpu(hdr->src_cid),
> +			le32_to_cpu(hdr->src_port));
> +	vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid),
> +			le32_to_cpu(hdr->dst_port));
>  
>  	trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port,
>  					dst.svm_cid, dst.svm_port,
> -					le32_to_cpu(pkt->hdr.len),
> -					le16_to_cpu(pkt->hdr.type),
> -					le16_to_cpu(pkt->hdr.op),
> -					le32_to_cpu(pkt->hdr.flags),
> -					le32_to_cpu(pkt->hdr.buf_alloc),
> -					le32_to_cpu(pkt->hdr.fwd_cnt));
> -
> -	if (!virtio_transport_valid_type(le16_to_cpu(pkt->hdr.type))) {
> -		(void)virtio_transport_reset_no_sock(t, pkt);
> +					le32_to_cpu(hdr->len),
> +					le16_to_cpu(hdr->type),
> +					le16_to_cpu(hdr->op),
> +					le32_to_cpu(hdr->flags),
> +					le32_to_cpu(hdr->buf_alloc),
> +					le32_to_cpu(hdr->fwd_cnt));
> +
> +	if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
> +		(void)virtio_transport_reset_no_sock(t, skb);
>  		goto free_pkt;
>  	}
>  
> @@ -1273,13 +1260,13 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  	if (!sk) {
>  		sk = vsock_find_bound_socket(&dst);
>  		if (!sk) {
> -			(void)virtio_transport_reset_no_sock(t, pkt);
> +			(void)virtio_transport_reset_no_sock(t, skb);
>  			goto free_pkt;
>  		}
>  	}
>  
> -	if (virtio_transport_get_type(sk) != le16_to_cpu(pkt->hdr.type)) {
> -		(void)virtio_transport_reset_no_sock(t, pkt);
> +	if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
> +		(void)virtio_transport_reset_no_sock(t, skb);
>  		sock_put(sk);
>  		goto free_pkt;
>  	}
> @@ -1290,13 +1277,13 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  
>  	/* Check if sk has been closed before lock_sock */
>  	if (sock_flag(sk, SOCK_DONE)) {
> -		(void)virtio_transport_reset_no_sock(t, pkt);
> +		(void)virtio_transport_reset_no_sock(t, skb);
>  		release_sock(sk);
>  		sock_put(sk);
>  		goto free_pkt;
>  	}
>  
> -	space_available = virtio_transport_space_update(sk, pkt);
> +	space_available = virtio_transport_space_update(sk, skb);
>  
>  	/* Update CID in case it has changed after a transport reset event */
>  	if (vsk->local_addr.svm_cid != VMADDR_CID_ANY)
> @@ -1307,23 +1294,23 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  
>  	switch (sk->sk_state) {
>  	case TCP_LISTEN:
> -		virtio_transport_recv_listen(sk, pkt, t);
> -		virtio_transport_free_pkt(pkt);
> +		virtio_transport_recv_listen(sk, skb, t);
> +		kfree_skb(skb);
>  		break;
>  	case TCP_SYN_SENT:
> -		virtio_transport_recv_connecting(sk, pkt);
> -		virtio_transport_free_pkt(pkt);
> +		virtio_transport_recv_connecting(sk, skb);
> +		kfree_skb(skb);
>  		break;
>  	case TCP_ESTABLISHED:
> -		virtio_transport_recv_connected(sk, pkt);
> +		virtio_transport_recv_connected(sk, skb);
>  		break;
>  	case TCP_CLOSING:
> -		virtio_transport_recv_disconnecting(sk, pkt);
> -		virtio_transport_free_pkt(pkt);
> +		virtio_transport_recv_disconnecting(sk, skb);
> +		kfree_skb(skb);
>  		break;
>  	default:
> -		(void)virtio_transport_reset_no_sock(t, pkt);
> -		virtio_transport_free_pkt(pkt);
> +		(void)virtio_transport_reset_no_sock(t, skb);
> +		kfree_skb(skb);
>  		break;
>  	}
>  
> @@ -1336,16 +1323,42 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  	return;
>  
>  free_pkt:
> -	virtio_transport_free_pkt(pkt);
> +	kfree(skb);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
>  
> -void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
> +/* Remove skbs found in a queue that have a vsk that matches.
> + *
> + * Each skb is freed.
> + *
> + * Returns the count of skbs that were reply packets.
> + */
> +int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *queue)
>  {
> -	kfree(pkt->buf);
> -	kfree(pkt);
> +	int cnt = 0;
> +	struct sk_buff *skb, *tmp;
> +	struct sk_buff_head freeme;
> +
> +	skb_queue_head_init(&freeme);
> +
> +	spin_lock_bh(&queue->lock);
> +	skb_queue_walk_safe(queue, skb, tmp) {
> +		if (vsock_sk(skb->sk) != vsk)
> +			continue;
> +
> +		__skb_unlink(skb, queue);
> +		skb_queue_tail(&freeme, skb);
> +
> +		if (vsock_metadata(skb)->flags & VIRTIO_VSOCK_METADATA_FLAGS_REPLY)
> +			cnt++;
> +	}
> +	spin_unlock_bh(&queue->lock);
> +
> +	skb_queue_purge(&freeme);
> +
> +	return cnt;
>  }
> -EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
> +EXPORT_SYMBOL_GPL(virtio_transport_purge_skbs);
>  
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Asias He");
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 169a8cf65b39..906f7cdff65e 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -16,7 +16,7 @@ struct vsock_loopback {
>  	struct workqueue_struct *workqueue;
>  
>  	spinlock_t pkt_list_lock; /* protects pkt_list */
> -	struct list_head pkt_list;
> +	struct sk_buff_head pkt_queue;
>  	struct work_struct pkt_work;
>  };
>  
> @@ -27,13 +27,13 @@ static u32 vsock_loopback_get_local_cid(void)
>  	return VMADDR_CID_LOCAL;
>  }
>  
> -static int vsock_loopback_send_pkt(struct virtio_vsock_pkt *pkt)
> +static int vsock_loopback_send_pkt(struct sk_buff *skb)
>  {
>  	struct vsock_loopback *vsock = &the_vsock_loopback;
> -	int len = pkt->len;
> +	int len = skb->len;
>  
>  	spin_lock_bh(&vsock->pkt_list_lock);
> -	list_add_tail(&pkt->list, &vsock->pkt_list);
> +	skb_queue_tail(&vsock->pkt_queue, skb);
>  	spin_unlock_bh(&vsock->pkt_list_lock);
>  
>  	queue_work(vsock->workqueue, &vsock->pkt_work);
> @@ -44,21 +44,8 @@ static int vsock_loopback_send_pkt(struct virtio_vsock_pkt *pkt)
>  static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk)
>  {
>  	struct vsock_loopback *vsock = &the_vsock_loopback;
> -	struct virtio_vsock_pkt *pkt, *n;
> -	LIST_HEAD(freeme);
>  
> -	spin_lock_bh(&vsock->pkt_list_lock);
> -	list_for_each_entry_safe(pkt, n, &vsock->pkt_list, list) {
> -		if (pkt->vsk != vsk)
> -			continue;
> -		list_move(&pkt->list, &freeme);
> -	}
> -	spin_unlock_bh(&vsock->pkt_list_lock);
> -
> -	list_for_each_entry_safe(pkt, n, &freeme, list) {
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> +	virtio_transport_purge_skbs(vsk, &vsock->pkt_queue);
>  
>  	return 0;
>  }
> @@ -121,20 +108,20 @@ static void vsock_loopback_work(struct work_struct *work)
>  {
>  	struct vsock_loopback *vsock =
>  		container_of(work, struct vsock_loopback, pkt_work);
> -	LIST_HEAD(pkts);
> +	struct sk_buff_head pkts;
> +
> +	skb_queue_head_init(&pkts);
>  
>  	spin_lock_bh(&vsock->pkt_list_lock);
> -	list_splice_init(&vsock->pkt_list, &pkts);
> +	skb_queue_splice_init(&vsock->pkt_queue, &pkts);
>  	spin_unlock_bh(&vsock->pkt_list_lock);
>  
> -	while (!list_empty(&pkts)) {
> -		struct virtio_vsock_pkt *pkt;
> +	while (!skb_queue_empty(&pkts)) {
> +		struct sk_buff *skb;
>  
> -		pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list);
> -		list_del_init(&pkt->list);
> -
> -		virtio_transport_deliver_tap_pkt(pkt);
> -		virtio_transport_recv_pkt(&loopback_transport, pkt);
> +		skb = skb_dequeue(&pkts);
> +		virtio_transport_deliver_tap_pkt(skb);
> +		virtio_transport_recv_pkt(&loopback_transport, skb);
>  	}
>  }
>  
> @@ -148,7 +135,7 @@ static int __init vsock_loopback_init(void)
>  		return -ENOMEM;
>  
>  	spin_lock_init(&vsock->pkt_list_lock);
> -	INIT_LIST_HEAD(&vsock->pkt_list);
> +	skb_queue_head_init(&vsock->pkt_queue);
>  	INIT_WORK(&vsock->pkt_work, vsock_loopback_work);
>  
>  	ret = vsock_core_register(&loopback_transport.transport,
> @@ -166,19 +153,13 @@ static int __init vsock_loopback_init(void)
>  static void __exit vsock_loopback_exit(void)
>  {
>  	struct vsock_loopback *vsock = &the_vsock_loopback;
> -	struct virtio_vsock_pkt *pkt;
>  
>  	vsock_core_unregister(&loopback_transport.transport);
>  
>  	flush_work(&vsock->pkt_work);
>  
>  	spin_lock_bh(&vsock->pkt_list_lock);
> -	while (!list_empty(&vsock->pkt_list)) {
> -		pkt = list_first_entry(&vsock->pkt_list,
> -				       struct virtio_vsock_pkt, list);
> -		list_del(&pkt->list);
> -		virtio_transport_free_pkt(pkt);
> -	}
> +	skb_queue_purge(&vsock->pkt_queue);
>  	spin_unlock_bh(&vsock->pkt_list_lock);
>  
>  	destroy_workqueue(vsock->workqueue);
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH 2/6] vsock: return errors other than -ENOMEM to socket
       [not found] ` <d81818b868216c774613dd03641fcfe63cc55a45.1660362668.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:30   ` Bobby Eshleman
  2022-08-17  5:28     ` Arseniy Krasnov
  0 siblings, 1 reply; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:30 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Dexuan Cui, kvm, virtualization, netdev, linux-kernel,
	linux-hyperv

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:05AM -0700, Bobby Eshleman wrote:
> This commit allows vsock implementations to return errors
> to the socket layer other than -ENOMEM. One immediate effect
> of this is that upon the sk_sndbuf threshold being reached -EAGAIN
> will be returned and userspace may throttle appropriately.
> 
> Resultingly, a known issue with uperf is resolved[1].
> 
> Additionally, to preserve legacy behavior for non-virtio
> implementations, hyperv/vmci force errors to be -ENOMEM so that behavior
> is unchanged.
> 
> [1]: https://gitlab.com/vsock/vsock/-/issues/1
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
>  include/linux/virtio_vsock.h            | 3 +++
>  net/vmw_vsock/af_vsock.c                | 3 ++-
>  net/vmw_vsock/hyperv_transport.c        | 2 +-
>  net/vmw_vsock/virtio_transport_common.c | 3 ---
>  net/vmw_vsock/vmci_transport.c          | 9 ++++++++-
>  5 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 17ed01466875..9a37eddbb87a 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -8,6 +8,9 @@
>  #include <net/sock.h>
>  #include <net/af_vsock.h>
>  
> +/* Threshold for detecting small packets to copy */
> +#define GOOD_COPY_LEN  128
> +
>  enum virtio_vsock_metadata_flags {
>  	VIRTIO_VSOCK_METADATA_FLAGS_REPLY		= BIT(0),
>  	VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED	= BIT(1),
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index e348b2d09eac..1893f8aafa48 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -1844,8 +1844,9 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg,
>  			written = transport->stream_enqueue(vsk,
>  					msg, len - total_written);
>  		}
> +
>  		if (written < 0) {
> -			err = -ENOMEM;
> +			err = written;
>  			goto out_err;
>  		}
>  
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index fd98229e3db3..e99aea571f6f 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -687,7 +687,7 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
>  	if (bytes_written)
>  		ret = bytes_written;
>  	kfree(send_buf);
> -	return ret;
> +	return ret < 0 ? -ENOMEM : ret;
>  }
>  
>  static s64 hvs_stream_has_data(struct vsock_sock *vsk)
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 920578597bb9..d5780599fe93 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -23,9 +23,6 @@
>  /* How long to wait for graceful shutdown of a connection */
>  #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
>  
> -/* Threshold for detecting small packets to copy */
> -#define GOOD_COPY_LEN  128
> -
>  static const struct virtio_transport *
>  virtio_transport_get_ops(struct vsock_sock *vsk)
>  {
> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
> index b14f0ed7427b..c927a90dc859 100644
> --- a/net/vmw_vsock/vmci_transport.c
> +++ b/net/vmw_vsock/vmci_transport.c
> @@ -1838,7 +1838,14 @@ static ssize_t vmci_transport_stream_enqueue(
>  	struct msghdr *msg,
>  	size_t len)
>  {
> -	return vmci_qpair_enquev(vmci_trans(vsk)->qpair, msg, len, 0);
> +	int err;
> +
> +	err = vmci_qpair_enquev(vmci_trans(vsk)->qpair, msg, len, 0);
> +
> +	if (err < 0)
> +		err = -ENOMEM;
> +
> +	return err;
>  }
>  
>  static s64 vmci_transport_stream_has_data(struct vsock_sock *vsk)
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock
       [not found] ` <5a93c5aad99d79f028d349cb7e3c128c65d5d7e2.1660362668.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:31   ` Bobby Eshleman
  0 siblings, 0 replies; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:31 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, kvm, virtualization, netdev, linux-kernel

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote:
> In order to support usage of qdisc on vsock traffic, this commit
> introduces a struct net_device to vhost and virtio vsock.
> 
> Two new devices are created, vhost-vsock for vhost and virtio-vsock
> for virtio. The devices are attached to the respective transports.
> 
> To bypass the usage of the device, the user may "down" the associated
> network interface using common tools. For example, "ip link set dev
> virtio-vsock down" lets vsock bypass the net_device and qdisc entirely,
> simply using the FIFO logic of the prior implementation.
> 
> For both hosts and guests, there is one device for all G2H vsock sockets
> and one device for all H2G vsock sockets. This makes sense for guests
> because the driver only supports a single vsock channel (one pair of
> TX/RX virtqueues), so one device and qdisc fits. For hosts, this may not
> seem ideal for some workloads. However, it is possible to use a
> multi-queue qdisc, where a given queue is responsible for a range of
> sockets. This seems to be a better solution than having one device per
> socket, which may yield a very large number of devices and qdiscs, all
> of which are dynamically being created and destroyed. Because of this
> dynamism, it would also require a complex policy management daemon, as
> devices would constantly be spun up and down as sockets were created and
> destroyed. To avoid this, one device and qdisc also applies to all H2G
> sockets.
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
>  drivers/vhost/vsock.c                   |  19 +++-
>  include/linux/virtio_vsock.h            |  10 +++
>  net/vmw_vsock/virtio_transport.c        |  19 +++-
>  net/vmw_vsock/virtio_transport_common.c | 112 +++++++++++++++++++++++-
>  4 files changed, 152 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index f8601d93d94d..b20ddec2664b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -927,13 +927,30 @@ static int __init vhost_vsock_init(void)
>  				  VSOCK_TRANSPORT_F_H2G);
>  	if (ret < 0)
>  		return ret;
> -	return misc_register(&vhost_vsock_misc);
> +
> +	ret = virtio_transport_init(&vhost_transport, "vhost-vsock");
> +	if (ret < 0)
> +		goto out_unregister;
> +
> +	ret = misc_register(&vhost_vsock_misc);
> +	if (ret < 0)
> +		goto out_transport_exit;
> +	return ret;
> +
> +out_transport_exit:
> +	virtio_transport_exit(&vhost_transport);
> +
> +out_unregister:
> +	vsock_core_unregister(&vhost_transport.transport);
> +	return ret;
> +
>  };
>  
>  static void __exit vhost_vsock_exit(void)
>  {
>  	misc_deregister(&vhost_vsock_misc);
>  	vsock_core_unregister(&vhost_transport.transport);
> +	virtio_transport_exit(&vhost_transport);
>  };
>  
>  module_init(vhost_vsock_init);
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 9a37eddbb87a..5d7e7fbd75f8 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -91,10 +91,20 @@ struct virtio_transport {
>  	/* This must be the first field */
>  	struct vsock_transport transport;
>  
> +	/* Used almost exclusively for qdisc */
> +	struct net_device *dev;
> +
>  	/* Takes ownership of the packet */
>  	int (*send_pkt)(struct sk_buff *skb);
>  };
>  
> +int
> +virtio_transport_init(struct virtio_transport *t,
> +		      const char *name);
> +
> +void
> +virtio_transport_exit(struct virtio_transport *t);
> +
>  ssize_t
>  virtio_transport_stream_dequeue(struct vsock_sock *vsk,
>  				struct msghdr *msg,
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 3bb293fd8607..c6212eb38d3c 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -131,7 +131,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>  		 * the vq
>  		 */
>  		if (ret < 0) {
> -			skb_queue_head(&vsock->send_pkt_queue, skb);
> +			spin_lock_bh(&vsock->send_pkt_queue.lock);
> +			__skb_queue_head(&vsock->send_pkt_queue, skb);
> +			spin_unlock_bh(&vsock->send_pkt_queue.lock);
>  			break;
>  		}
>  
> @@ -676,7 +678,9 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
>  		kfree_skb(skb);
>  	mutex_unlock(&vsock->tx_lock);
>  
> -	skb_queue_purge(&vsock->send_pkt_queue);
> +	spin_lock_bh(&vsock->send_pkt_queue.lock);
> +	__skb_queue_purge(&vsock->send_pkt_queue);
> +	spin_unlock_bh(&vsock->send_pkt_queue.lock);
>  
>  	/* Delete virtqueues and flush outstanding callbacks if any */
>  	vdev->config->del_vqs(vdev);
> @@ -760,6 +764,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
>  	flush_work(&vsock->event_work);
>  	flush_work(&vsock->send_pkt_work);
>  
> +	virtio_transport_exit(&virtio_transport);
> +
>  	mutex_unlock(&the_virtio_vsock_mutex);
>  
>  	kfree(vsock);
> @@ -844,12 +850,18 @@ static int __init virtio_vsock_init(void)
>  	if (ret)
>  		goto out_wq;
>  
> -	ret = register_virtio_driver(&virtio_vsock_driver);
> +	ret = virtio_transport_init(&virtio_transport, "virtio-vsock");
>  	if (ret)
>  		goto out_vci;
>  
> +	ret = register_virtio_driver(&virtio_vsock_driver);
> +	if (ret)
> +		goto out_transport;
> +
>  	return 0;
>  
> +out_transport:
> +	virtio_transport_exit(&virtio_transport);
>  out_vci:
>  	vsock_core_unregister(&virtio_transport.transport);
>  out_wq:
> @@ -861,6 +873,7 @@ static void __exit virtio_vsock_exit(void)
>  {
>  	unregister_virtio_driver(&virtio_vsock_driver);
>  	vsock_core_unregister(&virtio_transport.transport);
> +	virtio_transport_exit(&virtio_transport);
>  	destroy_workqueue(virtio_vsock_workqueue);
>  }
>  
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index d5780599fe93..bdf16fff054f 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -16,6 +16,7 @@
>  
>  #include <net/sock.h>
>  #include <net/af_vsock.h>
> +#include <net/pkt_sched.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/vsock_virtio_transport_common.h>
> @@ -23,6 +24,93 @@
>  /* How long to wait for graceful shutdown of a connection */
>  #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
>  
> +struct virtio_transport_priv {
> +	struct virtio_transport *trans;
> +};
> +
> +static netdev_tx_t virtio_transport_start_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct virtio_transport *t =
> +		((struct virtio_transport_priv *)netdev_priv(dev))->trans;
> +	int ret;
> +
> +	ret = t->send_pkt(skb);
> +	if (unlikely(ret == -ENODEV))
> +		return NETDEV_TX_BUSY;
> +
> +	return NETDEV_TX_OK;
> +}
> +
> +const struct net_device_ops virtio_transport_netdev_ops = {
> +	.ndo_start_xmit = virtio_transport_start_xmit,
> +};
> +
> +static void virtio_transport_setup(struct net_device *dev)
> +{
> +	dev->netdev_ops = &virtio_transport_netdev_ops;
> +	dev->needs_free_netdev = true;
> +	dev->flags = IFF_NOARP;
> +	dev->mtu = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> +	dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> +}
> +
> +static int ifup(struct net_device *dev)
> +{
> +	int ret;
> +
> +	rtnl_lock();
> +	ret = dev_open(dev, NULL) ? -ENOMEM : 0;
> +	rtnl_unlock();
> +
> +	return ret;
> +}
> +
> +/* virtio_transport_init - initialize a virtio vsock transport layer
> + *
> + * @t: ptr to the virtio transport struct to initialize
> + * @name: the name of the net_device to be created.
> + *
> + * Return 0 on success, otherwise negative errno.
> + */
> +int virtio_transport_init(struct virtio_transport *t, const char *name)
> +{
> +	struct virtio_transport_priv *priv;
> +	int ret;
> +
> +	t->dev = alloc_netdev(sizeof(*priv), name, NET_NAME_UNKNOWN, virtio_transport_setup);
> +	if (!t->dev)
> +		return -ENOMEM;
> +
> +	priv = netdev_priv(t->dev);
> +	priv->trans = t;
> +
> +	ret = register_netdev(t->dev);
> +	if (ret < 0)
> +		goto out_free_netdev;
> +
> +	ret = ifup(t->dev);
> +	if (ret < 0)
> +		goto out_unregister_netdev;
> +
> +	return 0;
> +
> +out_unregister_netdev:
> +	unregister_netdev(t->dev);
> +
> +out_free_netdev:
> +	free_netdev(t->dev);
> +
> +	return ret;
> +}
> +
> +void virtio_transport_exit(struct virtio_transport *t)
> +{
> +	if (t->dev) {
> +		unregister_netdev(t->dev);
> +		free_netdev(t->dev);
> +	}
> +}
> +
>  static const struct virtio_transport *
>  virtio_transport_get_ops(struct vsock_sock *vsk)
>  {
> @@ -147,6 +235,24 @@ static u16 virtio_transport_get_type(struct sock *sk)
>  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
>  }
>  
> +/* Return pkt->len on success, otherwise negative errno */
> +static int virtio_transport_send_pkt(const struct virtio_transport *t, struct sk_buff *skb)
> +{
> +	int ret;
> +	int len = skb->len;
> +
> +	if (unlikely(!t->dev || !(t->dev->flags & IFF_UP)))
> +		return t->send_pkt(skb);
> +
> +	skb->dev = t->dev;
> +	ret = dev_queue_xmit(skb);
> +
> +	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN))
> +		return len;
> +
> +	return -ENOMEM;
> +}
> +
>  /* This function can only be used on connecting/connected sockets,
>   * since a socket assigned to a transport is required.
>   *
> @@ -202,9 +308,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  
>  	virtio_transport_inc_tx_pkt(vvs, skb);
>  
> -	err = t_ops->send_pkt(skb);
> -
> -	return err < 0 ? -ENOMEM : err;
> +	return virtio_transport_send_pkt(t_ops, skb);
>  }
>  
>  static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> @@ -834,7 +938,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
>  		return -ENOTCONN;
>  	}
>  
> -	return t->send_pkt(reply);
> +	return virtio_transport_send_pkt(t, reply);
>  }
>  
>  /* This function should be called with sk_lock held and SOCK_DONE set */
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH 4/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found] ` <3d1f32c4da81f8a0870e126369ba12bc8c4ad048.1660362668.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:31   ` Bobby Eshleman
  0 siblings, 0 replies; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:31 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, kvm, virtualization, netdev, linux-kernel

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:07AM -0700, Bobby Eshleman wrote:
> This commit adds a feature bit for virtio vsock to support datagrams.
> 
> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
>  drivers/vhost/vsock.c             | 3 ++-
>  include/uapi/linux/virtio_vsock.h | 1 +
>  net/vmw_vsock/virtio_transport.c  | 8 ++++++--
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index b20ddec2664b..a5d1bdb786fe 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -32,7 +32,8 @@
>  enum {
>  	VHOST_VSOCK_FEATURES = VHOST_FEATURES |
>  			       (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> -			       (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> +			       (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> +			       (1ULL << VIRTIO_VSOCK_F_DGRAM)
>  };
>  
>  enum {
> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> index 64738838bee5..857df3a3a70d 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -40,6 +40,7 @@
>  
>  /* The feature bitmap for virtio vsock */
>  #define VIRTIO_VSOCK_F_SEQPACKET	1	/* SOCK_SEQPACKET supported */
> +#define VIRTIO_VSOCK_F_DGRAM		2	/* Host support dgram vsock */
>  
>  struct virtio_vsock_config {
>  	__le64 guest_cid;
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index c6212eb38d3c..073314312683 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -35,6 +35,7 @@ static struct virtio_transport virtio_transport; /* forward declaration */
>  struct virtio_vsock {
>  	struct virtio_device *vdev;
>  	struct virtqueue *vqs[VSOCK_VQ_MAX];
> +	bool has_dgram;
>  
>  	/* Virtqueue processing is deferred to a workqueue */
>  	struct work_struct tx_work;
> @@ -709,7 +710,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>  	}
>  
>  	vsock->vdev = vdev;
> -
>  	vsock->rx_buf_nr = 0;
>  	vsock->rx_buf_max_nr = 0;
>  	atomic_set(&vsock->queued_replies, 0);
> @@ -726,6 +726,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
>  	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET))
>  		vsock->seqpacket_allow = true;
>  
> +	if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM))
> +		vsock->has_dgram = true;
> +
>  	vdev->priv = vsock;
>  
>  	ret = virtio_vsock_vqs_init(vsock);
> @@ -820,7 +823,8 @@ static struct virtio_device_id id_table[] = {
>  };
>  
>  static unsigned int features[] = {
> -	VIRTIO_VSOCK_F_SEQPACKET
> +	VIRTIO_VSOCK_F_SEQPACKET,
> +	VIRTIO_VSOCK_F_DGRAM
>  };
>  
>  static struct virtio_driver virtio_vsock_driver = {
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
       [not found] ` <3cb082f1c88f3f2ef1fc250dbc0745fb79c745c7.1660362668.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:32   ` Bobby Eshleman
  2022-08-17  5:01     ` Arseniy Krasnov
  0 siblings, 1 reply; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:32 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, kvm, virtualization, netdev, linux-kernel

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
> This patch supports dgram in virtio and on the vhost side.
> 
> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
>  drivers/vhost/vsock.c                   |   2 +-
>  include/net/af_vsock.h                  |   2 +
>  include/uapi/linux/virtio_vsock.h       |   1 +
>  net/vmw_vsock/af_vsock.c                |  26 +++-
>  net/vmw_vsock/virtio_transport.c        |   2 +-
>  net/vmw_vsock/virtio_transport_common.c | 173 ++++++++++++++++++++++--
>  6 files changed, 186 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index a5d1bdb786fe..3dc72a5647ca 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
>  	int ret;
>  
>  	ret = vsock_core_register(&vhost_transport.transport,
> -				  VSOCK_TRANSPORT_F_H2G);
> +				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
>  	if (ret < 0)
>  		return ret;
>  
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 1c53c4c4d88f..37e55c81e4df 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -78,6 +78,8 @@ struct vsock_sock {
>  s64 vsock_stream_has_data(struct vsock_sock *vsk);
>  s64 vsock_stream_has_space(struct vsock_sock *vsk);
>  struct sock *vsock_create_connected(struct sock *parent);
> +int vsock_bind_stream(struct vsock_sock *vsk,
> +		      struct sockaddr_vm *addr);
>  
>  /**** TRANSPORT ****/
>  
> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> index 857df3a3a70d..0975b9c88292 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
>  enum virtio_vsock_type {
>  	VIRTIO_VSOCK_TYPE_STREAM = 1,
>  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
>  };
>  
>  enum virtio_vsock_op {
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 1893f8aafa48..87e4ae1866d3 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>  	return 0;
>  }
>  
> +int vsock_bind_stream(struct vsock_sock *vsk,
> +		      struct sockaddr_vm *addr)
> +{
> +	int retval;
> +
> +	spin_lock_bh(&vsock_table_lock);
> +	retval = __vsock_bind_connectible(vsk, addr);
> +	spin_unlock_bh(&vsock_table_lock);
> +
> +	return retval;
> +}
> +EXPORT_SYMBOL(vsock_bind_stream);
> +
>  static int __vsock_bind_dgram(struct vsock_sock *vsk,
>  			      struct sockaddr_vm *addr)
>  {
> @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct vsock_transport *t, int features)
>  	}
>  
>  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> -		if (t_dgram) {
> -			err = -EBUSY;
> -			goto err_busy;
> +		/* TODO: always chose the G2H variant over others, support nesting later */
> +		if (features & VSOCK_TRANSPORT_F_G2H) {
> +			if (t_dgram)
> +				pr_warn("virtio_vsock: t_dgram already set\n");
> +			t_dgram = t;
> +		}
> +
> +		if (!t_dgram) {
> +			t_dgram = t;
>  		}
> -		t_dgram = t;
>  	}
>  
>  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 073314312683..d4526ca462d2 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
>  		return -ENOMEM;
>  
>  	ret = vsock_core_register(&virtio_transport.transport,
> -				  VSOCK_TRANSPORT_F_G2H);
> +				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
>  	if (ret)
>  		goto out_wq;
>  
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index bdf16fff054f..aedb48728677 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -229,7 +229,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>  
>  static u16 virtio_transport_get_type(struct sock *sk)
>  {
> -	if (sk->sk_type == SOCK_STREAM)
> +	if (sk->sk_type == SOCK_DGRAM)
> +		return VIRTIO_VSOCK_TYPE_DGRAM;
> +	else if (sk->sk_type == SOCK_STREAM)
>  		return VIRTIO_VSOCK_TYPE_STREAM;
>  	else
>  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> @@ -287,22 +289,29 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	vvs = vsk->trans;
>  
>  	/* we can send less than pkt_len bytes */
> -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> +			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> +		else
> +			return 0;
> +	}
>  
> -	/* virtio_transport_get_credit might return less than pkt_len credit */
> -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> +		/* virtio_transport_get_credit might return less than pkt_len credit */
> +		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>  
> -	/* Do not send zero length OP_RW pkt */
> -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> -		return pkt_len;
> +		/* Do not send zero length OP_RW pkt */
> +		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> +			return pkt_len;
> +	}
>  
>  	skb = virtio_transport_alloc_skb(info, pkt_len,
>  					 src_cid, src_port,
>  					 dst_cid, dst_port,
>  					 &err);
>  	if (!skb) {
> -		virtio_transport_put_credit(vvs, pkt_len);
> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> +			virtio_transport_put_credit(vvs, pkt_len);
>  		return err;
>  	}
>  
> @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
>  
> +static ssize_t
> +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
> +				  struct msghdr *msg, size_t len)
> +{
> +	struct virtio_vsock_sock *vvs = vsk->trans;
> +	struct sk_buff *skb;
> +	size_t total = 0;
> +	u32 free_space;
> +	int err = -EFAULT;
> +
> +	spin_lock_bh(&vvs->rx_lock);
> +	if (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> +		skb = __skb_dequeue(&vvs->rx_queue);
> +
> +		total = len;
> +		if (total > skb->len - vsock_metadata(skb)->off)
> +			total = skb->len - vsock_metadata(skb)->off;
> +		else if (total < skb->len - vsock_metadata(skb)->off)
> +			msg->msg_flags |= MSG_TRUNC;
> +
> +		/* sk_lock is held by caller so no one else can dequeue.
> +		 * Unlock rx_lock since memcpy_to_msg() may sleep.
> +		 */
> +		spin_unlock_bh(&vvs->rx_lock);
> +
> +		err = memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, total);
> +		if (err)
> +			return err;
> +
> +		spin_lock_bh(&vvs->rx_lock);
> +
> +		virtio_transport_dec_rx_pkt(vvs, skb);
> +		consume_skb(skb);
> +	}
> +
> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> +
> +	spin_unlock_bh(&vvs->rx_lock);
> +
> +	if (total > 0 && msg->msg_name) {
> +		/* Provide the address of the sender. */
> +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> +
> +		vsock_addr_init(vm_addr, le64_to_cpu(vsock_hdr(skb)->src_cid),
> +				le32_to_cpu(vsock_hdr(skb)->src_port));
> +		msg->msg_namelen = sizeof(*vm_addr);
> +	}
> +	return total;
> +}
> +
> +static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
> +{
> +	return virtio_transport_stream_has_data(vsk);
> +}
> +
>  int
>  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
>  				   struct msghdr *msg,
> @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>  			       struct msghdr *msg,
>  			       size_t len, int flags)
>  {
> -	return -EOPNOTSUPP;
> +	struct sock *sk;
> +	size_t err = 0;
> +	long timeout;
> +
> +	DEFINE_WAIT(wait);
> +
> +	sk = &vsk->sk;
> +	err = 0;
> +
> +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
> +		return -EOPNOTSUPP;
> +
> +	lock_sock(sk);
> +
> +	if (!len)
> +		goto out;
> +
> +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> +
> +	while (1) {
> +		s64 ready;
> +
> +		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
> +		ready = virtio_transport_dgram_has_data(vsk);
> +
> +		if (ready == 0) {
> +			if (timeout == 0) {
> +				err = -EAGAIN;
> +				finish_wait(sk_sleep(sk), &wait);
> +				break;
> +			}
> +
> +			release_sock(sk);
> +			timeout = schedule_timeout(timeout);
> +			lock_sock(sk);
> +
> +			if (signal_pending(current)) {
> +				err = sock_intr_errno(timeout);
> +				finish_wait(sk_sleep(sk), &wait);
> +				break;
> +			} else if (timeout == 0) {
> +				err = -EAGAIN;
> +				finish_wait(sk_sleep(sk), &wait);
> +				break;
> +			}
> +		} else {
> +			finish_wait(sk_sleep(sk), &wait);
> +
> +			if (ready < 0) {
> +				err = -ENOMEM;
> +				goto out;
> +			}
> +
> +			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
> +			break;
> +		}
> +	}
> +out:
> +	release_sock(sk);
> +	return err;
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>  
> @@ -819,13 +942,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>  				struct sockaddr_vm *addr)
>  {
> -	return -EOPNOTSUPP;
> +	return vsock_bind_stream(vsk, addr);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>  
>  bool virtio_transport_dgram_allow(u32 cid, u32 port)
>  {
> -	return false;
> +	return true;
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>  
> @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
>  			       struct msghdr *msg,
>  			       size_t dgram_len)
>  {
> -	return -EOPNOTSUPP;
> +	struct virtio_vsock_pkt_info info = {
> +		.op = VIRTIO_VSOCK_OP_RW,
> +		.msg = msg,
> +		.pkt_len = dgram_len,
> +		.vsk = vsk,
> +		.remote_cid = remote_addr->svm_cid,
> +		.remote_port = remote_addr->svm_port,
> +	};
> +
> +	return virtio_transport_send_pkt_info(vsk, &info);
>  }
>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>  
> @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct sock *sk,
>  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>  	int err = 0;
>  
> +	if (le16_to_cpu(vsock_hdr(skb)->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> +		virtio_transport_recv_enqueue(vsk, skb);
> +		sk->sk_data_ready(sk);
> +		return err;
> +	}
> +
>  	switch (le16_to_cpu(hdr->op)) {
>  	case VIRTIO_VSOCK_OP_RW:
>  		virtio_transport_recv_enqueue(vsk, skb);
> @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>  static bool virtio_transport_valid_type(u16 type)
>  {
>  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
>  }
>  
>  /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
> @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  		goto free_pkt;
>  	}
>  
> +	if (sk->sk_type == SOCK_DGRAM) {
> +		virtio_transport_recv_connected(sk, skb);
> +		goto out;
> +	}
> +
>  	space_available = virtio_transport_space_update(sk, skb);
>  
>  	/* Update CID in case it has changed after a transport reset event */
> @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>  		break;
>  	}
>  
> +out:
>  	release_sock(sk);
>  
>  	/* Release refcnt obtained when we fetched this socket out of the
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH 6/6] vsock_test: add tests for vsock dgram
       [not found] ` <db2e6c0ffa559ae6b8572b1981a6ad566aa73178.1660362669.git.bobby.eshleman@bytedance.com>
@ 2022-08-16  2:32   ` Bobby Eshleman
  0 siblings, 0 replies; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  2:32 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, Bobby Eshleman, Cong Wang, Jiang Wang,
	Stefano Garzarella, virtualization, netdev, linux-kernel

CC'ing virtio-dev@lists.oasis-open.org

On Mon, Aug 15, 2022 at 10:56:09AM -0700, Bobby Eshleman wrote:
> From: Jiang Wang <jiang.wang@bytedance.com>
> 
> Added test cases for vsock dgram types.
> 
> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> ---
>  tools/testing/vsock/util.c       | 105 +++++++++++++++++
>  tools/testing/vsock/util.h       |   4 +
>  tools/testing/vsock/vsock_test.c | 195 +++++++++++++++++++++++++++++++
>  3 files changed, 304 insertions(+)
> 
> diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
> index 2acbb7703c6a..d2f5b223bf85 100644
> --- a/tools/testing/vsock/util.c
> +++ b/tools/testing/vsock/util.c
> @@ -260,6 +260,57 @@ void send_byte(int fd, int expected_ret, int flags)
>  	}
>  }
>  
> +/* Transmit one byte and check the return value.
> + *
> + * expected_ret:
> + *  <0 Negative errno (for testing errors)
> + *   0 End-of-file
> + *   1 Success
> + */
> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> +				int flags)
> +{
> +	const uint8_t byte = 'A';
> +	ssize_t nwritten;
> +
> +	timeout_begin(TIMEOUT);
> +	do {
> +		nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr,
> +						len);
> +		timeout_check("write");
> +	} while (nwritten < 0 && errno == EINTR);
> +	timeout_end();
> +
> +	if (expected_ret < 0) {
> +		if (nwritten != -1) {
> +			fprintf(stderr, "bogus sendto(2) return value %zd\n",
> +				nwritten);
> +			exit(EXIT_FAILURE);
> +		}
> +		if (errno != -expected_ret) {
> +			perror("write");
> +			exit(EXIT_FAILURE);
> +		}
> +		return;
> +	}
> +
> +	if (nwritten < 0) {
> +		perror("write");
> +		exit(EXIT_FAILURE);
> +	}
> +	if (nwritten == 0) {
> +		if (expected_ret == 0)
> +			return;
> +
> +		fprintf(stderr, "unexpected EOF while sending byte\n");
> +		exit(EXIT_FAILURE);
> +	}
> +	if (nwritten != sizeof(byte)) {
> +		fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten);
> +		exit(EXIT_FAILURE);
> +	}
> +}
> +
>  /* Receive one byte and check the return value.
>   *
>   * expected_ret:
> @@ -313,6 +364,60 @@ void recv_byte(int fd, int expected_ret, int flags)
>  	}
>  }
>  
> +/* Receive one byte and check the return value.
> + *
> + * expected_ret:
> + *  <0 Negative errno (for testing errors)
> + *   0 End-of-file
> + *   1 Success
> + */
> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> +				int expected_ret, int flags)
> +{
> +	uint8_t byte;
> +	ssize_t nread;
> +
> +	timeout_begin(TIMEOUT);
> +	do {
> +		nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen);
> +		timeout_check("read");
> +	} while (nread < 0 && errno == EINTR);
> +	timeout_end();
> +
> +	if (expected_ret < 0) {
> +		if (nread != -1) {
> +			fprintf(stderr, "bogus recvfrom(2) return value %zd\n",
> +				nread);
> +			exit(EXIT_FAILURE);
> +		}
> +		if (errno != -expected_ret) {
> +			perror("read");
> +			exit(EXIT_FAILURE);
> +		}
> +		return;
> +	}
> +
> +	if (nread < 0) {
> +		perror("read");
> +		exit(EXIT_FAILURE);
> +	}
> +	if (nread == 0) {
> +		if (expected_ret == 0)
> +			return;
> +
> +		fprintf(stderr, "unexpected EOF while receiving byte\n");
> +		exit(EXIT_FAILURE);
> +	}
> +	if (nread != sizeof(byte)) {
> +		fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread);
> +		exit(EXIT_FAILURE);
> +	}
> +	if (byte != 'A') {
> +		fprintf(stderr, "unexpected byte read %c\n", byte);
> +		exit(EXIT_FAILURE);
> +	}
> +}
> +
>  /* Run test cases.  The program terminates if a failure occurs. */
>  void run_tests(const struct test_case *test_cases,
>  	       const struct test_opts *opts)
> diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h
> index a3375ad2fb7f..7213f2a51c1e 100644
> --- a/tools/testing/vsock/util.h
> +++ b/tools/testing/vsock/util.h
> @@ -43,7 +43,11 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port,
>  			   struct sockaddr_vm *clientaddrp);
>  void vsock_wait_remote_close(int fd);
>  void send_byte(int fd, int expected_ret, int flags);
> +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret,
> +				int flags);
>  void recv_byte(int fd, int expected_ret, int flags);
> +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen,
> +				int expected_ret, int flags);
>  void run_tests(const struct test_case *test_cases,
>  	       const struct test_opts *opts);
>  void list_tests(const struct test_case *test_cases);
> diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
> index dc577461afc2..640379f1b462 100644
> --- a/tools/testing/vsock/vsock_test.c
> +++ b/tools/testing/vsock/vsock_test.c
> @@ -201,6 +201,115 @@ static void test_stream_server_close_server(const struct test_opts *opts)
>  	close(fd);
>  }
>  
> +static void test_dgram_sendto_client(const struct test_opts *opts)
> +{
> +	union {
> +		struct sockaddr sa;
> +		struct sockaddr_vm svm;
> +	} addr = {
> +		.svm = {
> +			.svm_family = AF_VSOCK,
> +			.svm_port = 1234,
> +			.svm_cid = opts->peer_cid,
> +		},
> +	};
> +	int fd;
> +
> +	/* Wait for the server to be ready */
> +	control_expectln("BIND");
> +
> +	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> +	if (fd < 0) {
> +		perror("socket");
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0);
> +
> +	/* Notify the server that the client has finished */
> +	control_writeln("DONE");
> +
> +	close(fd);
> +}
> +
> +static void test_dgram_sendto_server(const struct test_opts *opts)
> +{
> +	union {
> +		struct sockaddr sa;
> +		struct sockaddr_vm svm;
> +	} addr = {
> +		.svm = {
> +			.svm_family = AF_VSOCK,
> +			.svm_port = 1234,
> +			.svm_cid = VMADDR_CID_ANY,
> +		},
> +	};
> +	int fd;
> +	int len = sizeof(addr.sa);
> +
> +	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> +
> +	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> +		perror("bind");
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	/* Notify the client that the server is ready */
> +	control_writeln("BIND");
> +
> +	recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> +	printf("got message from cid:%d, port %u ", addr.svm.svm_cid,
> +			addr.svm.svm_port);
> +
> +	/* Wait for the client to finish */
> +	control_expectln("DONE");
> +
> +	close(fd);
> +}
> +
> +static void test_dgram_connect_client(const struct test_opts *opts)
> +{
> +	union {
> +		struct sockaddr sa;
> +		struct sockaddr_vm svm;
> +	} addr = {
> +		.svm = {
> +			.svm_family = AF_VSOCK,
> +			.svm_port = 1234,
> +			.svm_cid = opts->peer_cid,
> +		},
> +	};
> +	int fd;
> +	int ret;
> +
> +	/* Wait for the server to be ready */
> +	control_expectln("BIND");
> +
> +	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> +	if (fd < 0) {
> +		perror("bind");
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	ret = connect(fd, &addr.sa, sizeof(addr.svm));
> +	if (ret < 0) {
> +		perror("connect");
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	send_byte(fd, 1, 0);
> +
> +	/* Notify the server that the client has finished */
> +	control_writeln("DONE");
> +
> +	close(fd);
> +}
> +
> +static void test_dgram_connect_server(const struct test_opts *opts)
> +{
> +	test_dgram_sendto_server(opts);
> +}
> +
>  /* With the standard socket sizes, VMCI is able to support about 100
>   * concurrent stream connections.
>   */
> @@ -254,6 +363,77 @@ static void test_stream_multiconn_server(const struct test_opts *opts)
>  		close(fds[i]);
>  }
>  
> +static void test_dgram_multiconn_client(const struct test_opts *opts)
> +{
> +	int fds[MULTICONN_NFDS];
> +	int i;
> +	union {
> +		struct sockaddr sa;
> +		struct sockaddr_vm svm;
> +	} addr = {
> +		.svm = {
> +			.svm_family = AF_VSOCK,
> +			.svm_port = 1234,
> +			.svm_cid = opts->peer_cid,
> +		},
> +	};
> +
> +	/* Wait for the server to be ready */
> +	control_expectln("BIND");
> +
> +	for (i = 0; i < MULTICONN_NFDS; i++) {
> +		fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0);
> +		if (fds[i] < 0) {
> +			perror("socket");
> +			exit(EXIT_FAILURE);
> +		}
> +	}
> +
> +	for (i = 0; i < MULTICONN_NFDS; i++)
> +		sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0);
> +
> +	/* Notify the server that the client has finished */
> +	control_writeln("DONE");
> +
> +	for (i = 0; i < MULTICONN_NFDS; i++)
> +		close(fds[i]);
> +}
> +
> +static void test_dgram_multiconn_server(const struct test_opts *opts)
> +{
> +	union {
> +		struct sockaddr sa;
> +		struct sockaddr_vm svm;
> +	} addr = {
> +		.svm = {
> +			.svm_family = AF_VSOCK,
> +			.svm_port = 1234,
> +			.svm_cid = VMADDR_CID_ANY,
> +		},
> +	};
> +	int fd;
> +	int len = sizeof(addr.sa);
> +	int i;
> +
> +	fd = socket(AF_VSOCK, SOCK_DGRAM, 0);
> +
> +	if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) {
> +		perror("bind");
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	/* Notify the client that the server is ready */
> +	control_writeln("BIND");
> +
> +	for (i = 0; i < MULTICONN_NFDS; i++)
> +		recvfrom_byte(fd, &addr.sa, &len, 1, 0);
> +
> +	/* Wait for the client to finish */
> +	control_expectln("DONE");
> +
> +	close(fd);
> +}
> +
>  static void test_stream_msg_peek_client(const struct test_opts *opts)
>  {
>  	int fd;
> @@ -646,6 +826,21 @@ static struct test_case test_cases[] = {
>  		.run_client = test_seqpacket_invalid_rec_buffer_client,
>  		.run_server = test_seqpacket_invalid_rec_buffer_server,
>  	},
> +	{
> +		.name = "SOCK_DGRAM client close",
> +		.run_client = test_dgram_sendto_client,
> +		.run_server = test_dgram_sendto_server,
> +	},
> +	{
> +		.name = "SOCK_DGRAM client connect",
> +		.run_client = test_dgram_connect_client,
> +		.run_server = test_dgram_connect_server,
> +	},
> +	{
> +		.name = "SOCK_DGRAM multiple connections",
> +		.run_client = test_dgram_multiconn_client,
> +		.run_server = test_dgram_multiconn_server,
> +	},
>  	{},
>  };
>  
> -- 
> 2.35.1
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-17  5:01     ` Arseniy Krasnov
@ 2022-08-16  9:57       ` Bobby Eshleman
  2022-08-18  8:24         ` Arseniy Krasnov
  2022-08-17  5:42       ` Arseniy Krasnov
  1 sibling, 1 reply; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  9:57 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, virtio-dev@lists.oasis-open.org, Bobby Eshleman,
	Cong Wang, Jiang Wang, Stefan Hajnoczi, Stefano Garzarella,
	Michael S. Tsirkin, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, Aug 17, 2022 at 05:01:00AM +0000, Arseniy Krasnov wrote:
> On 16.08.2022 05:32, Bobby Eshleman wrote:
> > CC'ing virtio-dev@lists.oasis-open.org
> > 
> > On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
> >> This patch supports dgram in virtio and on the vhost side.
> Hello,
> 
> sorry, i don't understand, how this maintains message boundaries? Or it
> is unnecessary for SOCK_DGRAM?
> 
> Thanks

If I understand your question, the length is included in the header, so
receivers always know that header start + header length + payload length
marks the message boundary.

> >>
> >> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >> ---
> >>  drivers/vhost/vsock.c                   |   2 +-
> >>  include/net/af_vsock.h                  |   2 +
> >>  include/uapi/linux/virtio_vsock.h       |   1 +
> >>  net/vmw_vsock/af_vsock.c                |  26 +++-
> >>  net/vmw_vsock/virtio_transport.c        |   2 +-
> >>  net/vmw_vsock/virtio_transport_common.c | 173 ++++++++++++++++++++++--
> >>  6 files changed, 186 insertions(+), 20 deletions(-)
> >>
> >> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >> index a5d1bdb786fe..3dc72a5647ca 100644
> >> --- a/drivers/vhost/vsock.c
> >> +++ b/drivers/vhost/vsock.c
> >> @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
> >>  	int ret;
> >>  
> >>  	ret = vsock_core_register(&vhost_transport.transport,
> >> -				  VSOCK_TRANSPORT_F_H2G);
> >> +				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
> >>  	if (ret < 0)
> >>  		return ret;
> >>  
> >> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> >> index 1c53c4c4d88f..37e55c81e4df 100644
> >> --- a/include/net/af_vsock.h
> >> +++ b/include/net/af_vsock.h
> >> @@ -78,6 +78,8 @@ struct vsock_sock {
> >>  s64 vsock_stream_has_data(struct vsock_sock *vsk);
> >>  s64 vsock_stream_has_space(struct vsock_sock *vsk);
> >>  struct sock *vsock_create_connected(struct sock *parent);
> >> +int vsock_bind_stream(struct vsock_sock *vsk,
> >> +		      struct sockaddr_vm *addr);
> >>  
> >>  /**** TRANSPORT ****/
> >>  
> >> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> >> index 857df3a3a70d..0975b9c88292 100644
> >> --- a/include/uapi/linux/virtio_vsock.h
> >> +++ b/include/uapi/linux/virtio_vsock.h
> >> @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> >>  enum virtio_vsock_type {
> >>  	VIRTIO_VSOCK_TYPE_STREAM = 1,
> >>  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> >> +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> >>  };
> >>  
> >>  enum virtio_vsock_op {
> >> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >> index 1893f8aafa48..87e4ae1866d3 100644
> >> --- a/net/vmw_vsock/af_vsock.c
> >> +++ b/net/vmw_vsock/af_vsock.c
> >> @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >>  	return 0;
> >>  }
> >>  
> >> +int vsock_bind_stream(struct vsock_sock *vsk,
> >> +		      struct sockaddr_vm *addr)
> >> +{
> >> +	int retval;
> >> +
> >> +	spin_lock_bh(&vsock_table_lock);
> >> +	retval = __vsock_bind_connectible(vsk, addr);
> >> +	spin_unlock_bh(&vsock_table_lock);
> >> +
> >> +	return retval;
> >> +}
> >> +EXPORT_SYMBOL(vsock_bind_stream);
> >> +
> >>  static int __vsock_bind_dgram(struct vsock_sock *vsk,
> >>  			      struct sockaddr_vm *addr)
> >>  {
> >> @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> >>  	}
> >>  
> >>  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> >> -		if (t_dgram) {
> >> -			err = -EBUSY;
> >> -			goto err_busy;
> >> +		/* TODO: always chose the G2H variant over others, support nesting later */
> >> +		if (features & VSOCK_TRANSPORT_F_G2H) {
> >> +			if (t_dgram)
> >> +				pr_warn("virtio_vsock: t_dgram already set\n");
> >> +			t_dgram = t;
> >> +		}
> >> +
> >> +		if (!t_dgram) {
> >> +			t_dgram = t;
> >>  		}
> >> -		t_dgram = t;
> >>  	}
> >>  
> >>  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> >> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >> index 073314312683..d4526ca462d2 100644
> >> --- a/net/vmw_vsock/virtio_transport.c
> >> +++ b/net/vmw_vsock/virtio_transport.c
> >> @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
> >>  		return -ENOMEM;
> >>  
> >>  	ret = vsock_core_register(&virtio_transport.transport,
> >> -				  VSOCK_TRANSPORT_F_G2H);
> >> +				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
> >>  	if (ret)
> >>  		goto out_wq;
> >>  
> >> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >> index bdf16fff054f..aedb48728677 100644
> >> --- a/net/vmw_vsock/virtio_transport_common.c
> >> +++ b/net/vmw_vsock/virtio_transport_common.c
> >> @@ -229,7 +229,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> >>  
> >>  static u16 virtio_transport_get_type(struct sock *sk)
> >>  {
> >> -	if (sk->sk_type == SOCK_STREAM)
> >> +	if (sk->sk_type == SOCK_DGRAM)
> >> +		return VIRTIO_VSOCK_TYPE_DGRAM;
> >> +	else if (sk->sk_type == SOCK_STREAM)
> >>  		return VIRTIO_VSOCK_TYPE_STREAM;
> >>  	else
> >>  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> >> @@ -287,22 +289,29 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >>  	vvs = vsk->trans;
> >>  
> >>  	/* we can send less than pkt_len bytes */
> >> -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> >> -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> >> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> >> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> >> +			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> >> +		else
> >> +			return 0;
> >> +	}
> >>  
> >> -	/* virtio_transport_get_credit might return less than pkt_len credit */
> >> -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> >> +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> >> +		/* virtio_transport_get_credit might return less than pkt_len credit */
> >> +		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> >>  
> >> -	/* Do not send zero length OP_RW pkt */
> >> -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> >> -		return pkt_len;
> >> +		/* Do not send zero length OP_RW pkt */
> >> +		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> >> +			return pkt_len;
> >> +	}
> >>  
> >>  	skb = virtio_transport_alloc_skb(info, pkt_len,
> >>  					 src_cid, src_port,
> >>  					 dst_cid, dst_port,
> >>  					 &err);
> >>  	if (!skb) {
> >> -		virtio_transport_put_credit(vvs, pkt_len);
> >> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> >> +			virtio_transport_put_credit(vvs, pkt_len);
> >>  		return err;
> >>  	}
> >>  
> >> @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
> >>  }
> >>  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
> >>  
> >> +static ssize_t
> >> +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
> >> +				  struct msghdr *msg, size_t len)
> >> +{
> >> +	struct virtio_vsock_sock *vvs = vsk->trans;
> >> +	struct sk_buff *skb;
> >> +	size_t total = 0;
> >> +	u32 free_space;
> >> +	int err = -EFAULT;
> >> +
> >> +	spin_lock_bh(&vvs->rx_lock);
> >> +	if (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> >> +		skb = __skb_dequeue(&vvs->rx_queue);
> >> +
> >> +		total = len;
> >> +		if (total > skb->len - vsock_metadata(skb)->off)
> >> +			total = skb->len - vsock_metadata(skb)->off;
> >> +		else if (total < skb->len - vsock_metadata(skb)->off)
> >> +			msg->msg_flags |= MSG_TRUNC;
> >> +
> >> +		/* sk_lock is held by caller so no one else can dequeue.
> >> +		 * Unlock rx_lock since memcpy_to_msg() may sleep.
> >> +		 */
> >> +		spin_unlock_bh(&vvs->rx_lock);
> >> +
> >> +		err = memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, total);
> >> +		if (err)
> >> +			return err;
> >> +
> >> +		spin_lock_bh(&vvs->rx_lock);
> >> +
> >> +		virtio_transport_dec_rx_pkt(vvs, skb);
> >> +		consume_skb(skb);
> >> +	}
> >> +
> >> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> >> +
> >> +	spin_unlock_bh(&vvs->rx_lock);
> >> +
> >> +	if (total > 0 && msg->msg_name) {
> >> +		/* Provide the address of the sender. */
> >> +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> >> +
> >> +		vsock_addr_init(vm_addr, le64_to_cpu(vsock_hdr(skb)->src_cid),
> >> +				le32_to_cpu(vsock_hdr(skb)->src_port));
> >> +		msg->msg_namelen = sizeof(*vm_addr);
> >> +	}
> >> +	return total;
> >> +}
> >> +
> >> +static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
> >> +{
> >> +	return virtio_transport_stream_has_data(vsk);
> >> +}
> >> +
> >>  int
> >>  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> >>  				   struct msghdr *msg,
> >> @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> >>  			       struct msghdr *msg,
> >>  			       size_t len, int flags)
> >>  {
> >> -	return -EOPNOTSUPP;
> >> +	struct sock *sk;
> >> +	size_t err = 0;
> >> +	long timeout;
> >> +
> >> +	DEFINE_WAIT(wait);
> >> +
> >> +	sk = &vsk->sk;
> >> +	err = 0;
> >> +
> >> +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
> >> +		return -EOPNOTSUPP;
> >> +
> >> +	lock_sock(sk);
> >> +
> >> +	if (!len)
> >> +		goto out;
> >> +
> >> +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> >> +
> >> +	while (1) {
> >> +		s64 ready;
> >> +
> >> +		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
> >> +		ready = virtio_transport_dgram_has_data(vsk);
> >> +
> >> +		if (ready == 0) {
> >> +			if (timeout == 0) {
> >> +				err = -EAGAIN;
> >> +				finish_wait(sk_sleep(sk), &wait);
> >> +				break;
> >> +			}
> >> +
> >> +			release_sock(sk);
> >> +			timeout = schedule_timeout(timeout);
> >> +			lock_sock(sk);
> >> +
> >> +			if (signal_pending(current)) {
> >> +				err = sock_intr_errno(timeout);
> >> +				finish_wait(sk_sleep(sk), &wait);
> >> +				break;
> >> +			} else if (timeout == 0) {
> >> +				err = -EAGAIN;
> >> +				finish_wait(sk_sleep(sk), &wait);
> >> +				break;
> >> +			}
> >> +		} else {
> >> +			finish_wait(sk_sleep(sk), &wait);
> >> +
> >> +			if (ready < 0) {
> >> +				err = -ENOMEM;
> >> +				goto out;
> >> +			}
> >> +
> >> +			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
> >> +			break;
> >> +		}
> >> +	}
> >> +out:
> >> +	release_sock(sk);
> >> +	return err;
> >>  }
> >>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> >>  
> >> @@ -819,13 +942,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> >>  				struct sockaddr_vm *addr)
> >>  {
> >> -	return -EOPNOTSUPP;
> >> +	return vsock_bind_stream(vsk, addr);
> >>  }
> >>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> >>  
> >>  bool virtio_transport_dgram_allow(u32 cid, u32 port)
> >>  {
> >> -	return false;
> >> +	return true;
> >>  }
> >>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> >>  
> >> @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> >>  			       struct msghdr *msg,
> >>  			       size_t dgram_len)
> >>  {
> >> -	return -EOPNOTSUPP;
> >> +	struct virtio_vsock_pkt_info info = {
> >> +		.op = VIRTIO_VSOCK_OP_RW,
> >> +		.msg = msg,
> >> +		.pkt_len = dgram_len,
> >> +		.vsk = vsk,
> >> +		.remote_cid = remote_addr->svm_cid,
> >> +		.remote_port = remote_addr->svm_port,
> >> +	};
> >> +
> >> +	return virtio_transport_send_pkt_info(vsk, &info);
> >>  }
> >>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> >>  
> >> @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct sock *sk,
> >>  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> >>  	int err = 0;
> >>  
> >> +	if (le16_to_cpu(vsock_hdr(skb)->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> >> +		virtio_transport_recv_enqueue(vsk, skb);
> >> +		sk->sk_data_ready(sk);
> >> +		return err;
> >> +	}
> >> +
> >>  	switch (le16_to_cpu(hdr->op)) {
> >>  	case VIRTIO_VSOCK_OP_RW:
> >>  		virtio_transport_recv_enqueue(vsk, skb);
> >> @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
> >>  static bool virtio_transport_valid_type(u16 type)
> >>  {
> >>  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> >> -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> >> +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> >> +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
> >>  }
> >>  
> >>  /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
> >> @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >>  		goto free_pkt;
> >>  	}
> >>  
> >> +	if (sk->sk_type == SOCK_DGRAM) {
> >> +		virtio_transport_recv_connected(sk, skb);
> >> +		goto out;
> >> +	}
> >> +
> >>  	space_available = virtio_transport_space_update(sk, skb);
> >>  
> >>  	/* Update CID in case it has changed after a transport reset event */
> >> @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >>  		break;
> >>  	}
> >>  
> >> +out:
> >>  	release_sock(sk);
> >>  
> >>  	/* Release refcnt obtained when we fetched this socket out of the
> >> -- 
> >> 2.35.1
> >>
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-17  5:42       ` Arseniy Krasnov
@ 2022-08-16  9:58         ` Bobby Eshleman
  2022-08-18  8:35           ` Arseniy Krasnov
  0 siblings, 1 reply; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16  9:58 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: Bobby Eshleman, virtio-dev@lists.oasis-open.org, Bobby Eshleman,
	Cong Wang, Jiang Wang, Stefan Hajnoczi, Stefano Garzarella,
	Michael S. Tsirkin, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, Aug 17, 2022 at 05:42:08AM +0000, Arseniy Krasnov wrote:
> On 17.08.2022 08:01, Arseniy Krasnov wrote:
> > On 16.08.2022 05:32, Bobby Eshleman wrote:
> >> CC'ing virtio-dev@lists.oasis-open.org
> >>
> >> On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
> >>> This patch supports dgram in virtio and on the vhost side.
> > Hello,
> > 
> > sorry, i don't understand, how this maintains message boundaries? Or it
> > is unnecessary for SOCK_DGRAM?
> > 
> > Thanks
> >>>
> >>> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> >>> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> >>> ---
> >>>  drivers/vhost/vsock.c                   |   2 +-
> >>>  include/net/af_vsock.h                  |   2 +
> >>>  include/uapi/linux/virtio_vsock.h       |   1 +
> >>>  net/vmw_vsock/af_vsock.c                |  26 +++-
> >>>  net/vmw_vsock/virtio_transport.c        |   2 +-
> >>>  net/vmw_vsock/virtio_transport_common.c | 173 ++++++++++++++++++++++--
> >>>  6 files changed, 186 insertions(+), 20 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> >>> index a5d1bdb786fe..3dc72a5647ca 100644
> >>> --- a/drivers/vhost/vsock.c
> >>> +++ b/drivers/vhost/vsock.c
> >>> @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
> >>>  	int ret;
> >>>  
> >>>  	ret = vsock_core_register(&vhost_transport.transport,
> >>> -				  VSOCK_TRANSPORT_F_H2G);
> >>> +				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
> >>>  	if (ret < 0)
> >>>  		return ret;
> >>>  
> >>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> >>> index 1c53c4c4d88f..37e55c81e4df 100644
> >>> --- a/include/net/af_vsock.h
> >>> +++ b/include/net/af_vsock.h
> >>> @@ -78,6 +78,8 @@ struct vsock_sock {
> >>>  s64 vsock_stream_has_data(struct vsock_sock *vsk);
> >>>  s64 vsock_stream_has_space(struct vsock_sock *vsk);
> >>>  struct sock *vsock_create_connected(struct sock *parent);
> >>> +int vsock_bind_stream(struct vsock_sock *vsk,
> >>> +		      struct sockaddr_vm *addr);
> >>>  
> >>>  /**** TRANSPORT ****/
> >>>  
> >>> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
> >>> index 857df3a3a70d..0975b9c88292 100644
> >>> --- a/include/uapi/linux/virtio_vsock.h
> >>> +++ b/include/uapi/linux/virtio_vsock.h
> >>> @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> >>>  enum virtio_vsock_type {
> >>>  	VIRTIO_VSOCK_TYPE_STREAM = 1,
> >>>  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> >>> +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> >>>  };
> >>>  
> >>>  enum virtio_vsock_op {
> >>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >>> index 1893f8aafa48..87e4ae1866d3 100644
> >>> --- a/net/vmw_vsock/af_vsock.c
> >>> +++ b/net/vmw_vsock/af_vsock.c
> >>> @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
> >>>  	return 0;
> >>>  }
> >>>  
> >>> +int vsock_bind_stream(struct vsock_sock *vsk,
> >>> +		      struct sockaddr_vm *addr)
> >>> +{
> >>> +	int retval;
> >>> +
> >>> +	spin_lock_bh(&vsock_table_lock);
> >>> +	retval = __vsock_bind_connectible(vsk, addr);
> >>> +	spin_unlock_bh(&vsock_table_lock);
> >>> +
> >>> +	return retval;
> >>> +}
> >>> +EXPORT_SYMBOL(vsock_bind_stream);
> >>> +
> >>>  static int __vsock_bind_dgram(struct vsock_sock *vsk,
> >>>  			      struct sockaddr_vm *addr)
> >>>  {
> >>> @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct vsock_transport *t, int features)
> >>>  	}
> >>>  
> >>>  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> >>> -		if (t_dgram) {
> >>> -			err = -EBUSY;
> >>> -			goto err_busy;
> >>> +		/* TODO: always chose the G2H variant over others, support nesting later */
> >>> +		if (features & VSOCK_TRANSPORT_F_G2H) {
> >>> +			if (t_dgram)
> >>> +				pr_warn("virtio_vsock: t_dgram already set\n");
> >>> +			t_dgram = t;
> >>> +		}
> >>> +
> >>> +		if (!t_dgram) {
> >>> +			t_dgram = t;
> >>>  		}
> >>> -		t_dgram = t;
> >>>  	}
> >>>  
> >>>  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> >>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> >>> index 073314312683..d4526ca462d2 100644
> >>> --- a/net/vmw_vsock/virtio_transport.c
> >>> +++ b/net/vmw_vsock/virtio_transport.c
> >>> @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
> >>>  		return -ENOMEM;
> >>>  
> >>>  	ret = vsock_core_register(&virtio_transport.transport,
> >>> -				  VSOCK_TRANSPORT_F_G2H);
> >>> +				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
> >>>  	if (ret)
> >>>  		goto out_wq;
> >>>  
> >>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >>> index bdf16fff054f..aedb48728677 100644
> >>> --- a/net/vmw_vsock/virtio_transport_common.c
> >>> +++ b/net/vmw_vsock/virtio_transport_common.c
> >>> @@ -229,7 +229,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> >>>  
> >>>  static u16 virtio_transport_get_type(struct sock *sk)
> >>>  {
> >>> -	if (sk->sk_type == SOCK_STREAM)
> >>> +	if (sk->sk_type == SOCK_DGRAM)
> >>> +		return VIRTIO_VSOCK_TYPE_DGRAM;
> >>> +	else if (sk->sk_type == SOCK_STREAM)
> >>>  		return VIRTIO_VSOCK_TYPE_STREAM;
> >>>  	else
> >>>  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> >>> @@ -287,22 +289,29 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >>>  	vvs = vsk->trans;
> >>>  
> >>>  	/* we can send less than pkt_len bytes */
> >>> -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> >>> -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> >>> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> >>> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> >>> +			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> >>> +		else
> >>> +			return 0;
> >>> +	}
> >>>  
> >>> -	/* virtio_transport_get_credit might return less than pkt_len credit */
> >>> -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> >>> +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> >>> +		/* virtio_transport_get_credit might return less than pkt_len credit */
> >>> +		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> >>>  
> >>> -	/* Do not send zero length OP_RW pkt */
> >>> -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> >>> -		return pkt_len;
> >>> +		/* Do not send zero length OP_RW pkt */
> >>> +		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> >>> +			return pkt_len;
> >>> +	}
> >>>  
> >>>  	skb = virtio_transport_alloc_skb(info, pkt_len,
> >>>  					 src_cid, src_port,
> >>>  					 dst_cid, dst_port,
> >>>  					 &err);
> >>>  	if (!skb) {
> >>> -		virtio_transport_put_credit(vvs, pkt_len);
> >>> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> >>> +			virtio_transport_put_credit(vvs, pkt_len);
> >>>  		return err;
> >>>  	}
> >>>  
> >>> @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
> >>>  
> >>> +static ssize_t
> >>> +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
> >>> +				  struct msghdr *msg, size_t len)
> >>> +{
> >>> +	struct virtio_vsock_sock *vvs = vsk->trans;
> >>> +	struct sk_buff *skb;
> >>> +	size_t total = 0;
> >>> +	u32 free_space;
> >>> +	int err = -EFAULT;
> >>> +
> >>> +	spin_lock_bh(&vvs->rx_lock);
> >>> +	if (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> >>> +		skb = __skb_dequeue(&vvs->rx_queue);
> >>> +
> >>> +		total = len;
> >>> +		if (total > skb->len - vsock_metadata(skb)->off)
> >>> +			total = skb->len - vsock_metadata(skb)->off;
> >>> +		else if (total < skb->len - vsock_metadata(skb)->off)
> >>> +			msg->msg_flags |= MSG_TRUNC;
> >>> +
> >>> +		/* sk_lock is held by caller so no one else can dequeue.
> >>> +		 * Unlock rx_lock since memcpy_to_msg() may sleep.
> >>> +		 */
> >>> +		spin_unlock_bh(&vvs->rx_lock);
> >>> +
> >>> +		err = memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, total);
> >>> +		if (err)
> >>> +			return err;
> >>> +
> >>> +		spin_lock_bh(&vvs->rx_lock);
> >>> +
> >>> +		virtio_transport_dec_rx_pkt(vvs, skb);
> >>> +		consume_skb(skb);
> >>> +	}
> >>> +
> >>> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
> >>> +
> >>> +	spin_unlock_bh(&vvs->rx_lock);
> >>> +
> >>> +	if (total > 0 && msg->msg_name) {
> >>> +		/* Provide the address of the sender. */
> >>> +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
> >>> +
> >>> +		vsock_addr_init(vm_addr, le64_to_cpu(vsock_hdr(skb)->src_cid),
> >>> +				le32_to_cpu(vsock_hdr(skb)->src_port));
> >>> +		msg->msg_namelen = sizeof(*vm_addr);
> >>> +	}
> >>> +	return total;
> >>> +}
> >>> +
> >>> +static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
> >>> +{
> >>> +	return virtio_transport_stream_has_data(vsk);
> >>> +}
> >>> +
> >>>  int
> >>>  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> >>>  				   struct msghdr *msg,
> >>> @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
> >>>  			       struct msghdr *msg,
> >>>  			       size_t len, int flags)
> >>>  {
> >>> -	return -EOPNOTSUPP;
> >>> +	struct sock *sk;
> >>> +	size_t err = 0;
> >>> +	long timeout;
> >>> +
> >>> +	DEFINE_WAIT(wait);
> >>> +
> >>> +	sk = &vsk->sk;
> >>> +	err = 0;
> >>> +
> >>> +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
> >>> +		return -EOPNOTSUPP;
> >>> +
> >>> +	lock_sock(sk);
> >>> +
> >>> +	if (!len)
> >>> +		goto out;
> >>> +
> >>> +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> >>> +
> >>> +	while (1) {
> >>> +		s64 ready;
> >>> +
> >>> +		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
> >>> +		ready = virtio_transport_dgram_has_data(vsk);
> >>> +
> >>> +		if (ready == 0) {
> >>> +			if (timeout == 0) {
> >>> +				err = -EAGAIN;
> >>> +				finish_wait(sk_sleep(sk), &wait);
> >>> +				break;
> >>> +			}
> >>> +
> >>> +			release_sock(sk);
> >>> +			timeout = schedule_timeout(timeout);
> >>> +			lock_sock(sk);
> >>> +
> >>> +			if (signal_pending(current)) {
> >>> +				err = sock_intr_errno(timeout);
> >>> +				finish_wait(sk_sleep(sk), &wait);
> >>> +				break;
> >>> +			} else if (timeout == 0) {
> >>> +				err = -EAGAIN;
> >>> +				finish_wait(sk_sleep(sk), &wait);
> >>> +				break;
> >>> +			}
> >>> +		} else {
> >>> +			finish_wait(sk_sleep(sk), &wait);
> >>> +
> >>> +			if (ready < 0) {
> >>> +				err = -ENOMEM;
> >>> +				goto out;
> >>> +			}
> >>> +
> >>> +			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
> >>> +			break;
> >>> +		}
> >>> +	}
> >>> +out:
> >>> +	release_sock(sk);
> >>> +	return err;
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> ^^^
> May be, this generic data waiting logic should be in af_vsock.c, as for stream/seqpacket?
> In this way, another transport which supports SOCK_DGRAM could reuse it.

I think that is a great idea. I'll test that change for v2.

Thanks.

> >>>  
> >>> @@ -819,13 +942,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> >>>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> >>>  				struct sockaddr_vm *addr)
> >>>  {
> >>> -	return -EOPNOTSUPP;
> >>> +	return vsock_bind_stream(vsk, addr);
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> >>>  
> >>>  bool virtio_transport_dgram_allow(u32 cid, u32 port)
> >>>  {
> >>> -	return false;
> >>> +	return true;
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> >>>  
> >>> @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
> >>>  			       struct msghdr *msg,
> >>>  			       size_t dgram_len)
> >>>  {
> >>> -	return -EOPNOTSUPP;
> >>> +	struct virtio_vsock_pkt_info info = {
> >>> +		.op = VIRTIO_VSOCK_OP_RW,
> >>> +		.msg = msg,
> >>> +		.pkt_len = dgram_len,
> >>> +		.vsk = vsk,
> >>> +		.remote_cid = remote_addr->svm_cid,
> >>> +		.remote_port = remote_addr->svm_port,
> >>> +	};
> >>> +
> >>> +	return virtio_transport_send_pkt_info(vsk, &info);
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> >>>  
> >>> @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct sock *sk,
> >>>  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> >>>  	int err = 0;
> >>>  
> >>> +	if (le16_to_cpu(vsock_hdr(skb)->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> >>> +		virtio_transport_recv_enqueue(vsk, skb);
> >>> +		sk->sk_data_ready(sk);
> >>> +		return err;
> >>> +	}
> >>> +
> >>>  	switch (le16_to_cpu(hdr->op)) {
> >>>  	case VIRTIO_VSOCK_OP_RW:
> >>>  		virtio_transport_recv_enqueue(vsk, skb);
> >>> @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
> >>>  static bool virtio_transport_valid_type(u16 type)
> >>>  {
> >>>  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> >>> -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> >>> +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> >>> +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
> >>>  }
> >>>  
> >>>  /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
> >>> @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >>>  		goto free_pkt;
> >>>  	}
> >>>  
> >>> +	if (sk->sk_type == SOCK_DGRAM) {
> >>> +		virtio_transport_recv_connected(sk, skb);
> >>> +		goto out;
> >>> +	}
> >>> +
> >>>  	space_available = virtio_transport_space_update(sk, skb);
> >>>  
> >>>  	/* Update CID in case it has changed after a transport reset event */
> >>> @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
> >>>  		break;
> >>>  	}
> >>>  
> >>> +out:
> >>>  	release_sock(sk);
> >>>  
> >>>  	/* Release refcnt obtained when we fetched this socket out of the
> >>> -- 
> >>> 2.35.1
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >>
> > 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-18  8:35           ` Arseniy Krasnov
@ 2022-08-16 20:52             ` Bobby Eshleman
  2022-08-19  4:30               ` Arseniy Krasnov
  0 siblings, 1 reply; 16+ messages in thread
From: Bobby Eshleman @ 2022-08-16 20:52 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: kvm@vger.kernel.org, jasowang@redhat.com,
	bobby.eshleman@gmail.com, davem@davemloft.net,
	virtio-dev@lists.oasis-open.org, stefanha@redhat.com,
	bobby.eshleman@bytedance.com, linux-kernel@vger.kernel.org,
	pabeni@redhat.com, edumazet@google.com, jiang.wang@bytedance.com,
	sgarzare@redhat.com, kuba@kernel.org, cong.wang@bytedance.com,
	netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	mst@redhat.com

On Thu, Aug 18, 2022 at 08:35:48AM +0000, Arseniy Krasnov wrote:
> On Tue, 2022-08-16 at 09:58 +0000, Bobby Eshleman wrote:
> > On Wed, Aug 17, 2022 at 05:42:08AM +0000, Arseniy Krasnov wrote:
> > > On 17.08.2022 08:01, Arseniy Krasnov wrote:
> > > > On 16.08.2022 05:32, Bobby Eshleman wrote:
> > > > > CC'ing virtio-dev@lists.oasis-open.org
> > > > > 
> > > > > On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
> > > > > > This patch supports dgram in virtio and on the vhost side.
> > > > Hello,
> > > > 
> > > > sorry, i don't understand, how this maintains message boundaries?
> > > > Or it
> > > > is unnecessary for SOCK_DGRAM?
> > > > 
> > > > Thanks
> > > > > > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > > > ---
> > > > > >  drivers/vhost/vsock.c                   |   2 +-
> > > > > >  include/net/af_vsock.h                  |   2 +
> > > > > >  include/uapi/linux/virtio_vsock.h       |   1 +
> > > > > >  net/vmw_vsock/af_vsock.c                |  26 +++-
> > > > > >  net/vmw_vsock/virtio_transport.c        |   2 +-
> > > > > >  net/vmw_vsock/virtio_transport_common.c | 173
> > > > > > ++++++++++++++++++++++--
> > > > > >  6 files changed, 186 insertions(+), 20 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > > index a5d1bdb786fe..3dc72a5647ca 100644
> > > > > > --- a/drivers/vhost/vsock.c
> > > > > > +++ b/drivers/vhost/vsock.c
> > > > > > @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
> > > > > >  	int ret;
> > > > > >  
> > > > > >  	ret = vsock_core_register(&vhost_transport.transport,
> > > > > > -				  VSOCK_TRANSPORT_F_H2G);
> > > > > > +				  VSOCK_TRANSPORT_F_H2G |
> > > > > > VSOCK_TRANSPORT_F_DGRAM);
> > > > > >  	if (ret < 0)
> > > > > >  		return ret;
> > > > > >  
> > > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > > index 1c53c4c4d88f..37e55c81e4df 100644
> > > > > > --- a/include/net/af_vsock.h
> > > > > > +++ b/include/net/af_vsock.h
> > > > > > @@ -78,6 +78,8 @@ struct vsock_sock {
> > > > > >  s64 vsock_stream_has_data(struct vsock_sock *vsk);
> > > > > >  s64 vsock_stream_has_space(struct vsock_sock *vsk);
> > > > > >  struct sock *vsock_create_connected(struct sock *parent);
> > > > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > > > +		      struct sockaddr_vm *addr);
> > > > > >  
> > > > > >  /**** TRANSPORT ****/
> > > > > >  
> > > > > > diff --git a/include/uapi/linux/virtio_vsock.h
> > > > > > b/include/uapi/linux/virtio_vsock.h
> > > > > > index 857df3a3a70d..0975b9c88292 100644
> > > > > > --- a/include/uapi/linux/virtio_vsock.h
> > > > > > +++ b/include/uapi/linux/virtio_vsock.h
> > > > > > @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> > > > > >  enum virtio_vsock_type {
> > > > > >  	VIRTIO_VSOCK_TYPE_STREAM = 1,
> > > > > >  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> > > > > > +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> > > > > >  };
> > > > > >  
> > > > > >  enum virtio_vsock_op {
> > > > > > diff --git a/net/vmw_vsock/af_vsock.c
> > > > > > b/net/vmw_vsock/af_vsock.c
> > > > > > index 1893f8aafa48..87e4ae1866d3 100644
> > > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > > @@ -675,6 +675,19 @@ static int
> > > > > > __vsock_bind_connectible(struct vsock_sock *vsk,
> > > > > >  	return 0;
> > > > > >  }
> > > > > >  
> > > > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > > > +		      struct sockaddr_vm *addr)
> > > > > > +{
> > > > > > +	int retval;
> > > > > > +
> > > > > > +	spin_lock_bh(&vsock_table_lock);
> > > > > > +	retval = __vsock_bind_connectible(vsk, addr);
> > > > > > +	spin_unlock_bh(&vsock_table_lock);
> > > > > > +
> > > > > > +	return retval;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(vsock_bind_stream);
> > > > > > +
> > > > > >  static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > > > >  			      struct sockaddr_vm *addr)
> > > > > >  {
> > > > > > @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct
> > > > > > vsock_transport *t, int features)
> > > > > >  	}
> > > > > >  
> > > > > >  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > > > > > -		if (t_dgram) {
> > > > > > -			err = -EBUSY;
> > > > > > -			goto err_busy;
> > > > > > +		/* TODO: always chose the G2H variant over
> > > > > > others, support nesting later */
> > > > > > +		if (features & VSOCK_TRANSPORT_F_G2H) {
> > > > > > +			if (t_dgram)
> > > > > > +				pr_warn("virtio_vsock: t_dgram
> > > > > > already set\n");
> > > > > > +			t_dgram = t;
> > > > > > +		}
> > > > > > +
> > > > > > +		if (!t_dgram) {
> > > > > > +			t_dgram = t;
> > > > > >  		}
> > > > > > -		t_dgram = t;
> > > > > >  	}
> > > > > >  
> > > > > >  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> > > > > > diff --git a/net/vmw_vsock/virtio_transport.c
> > > > > > b/net/vmw_vsock/virtio_transport.c
> > > > > > index 073314312683..d4526ca462d2 100644
> > > > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > > > @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
> > > > > >  		return -ENOMEM;
> > > > > >  
> > > > > >  	ret = vsock_core_register(&virtio_transport.transport,
> > > > > > -				  VSOCK_TRANSPORT_F_G2H);
> > > > > > +				  VSOCK_TRANSPORT_F_G2H |
> > > > > > VSOCK_TRANSPORT_F_DGRAM);
> > > > > >  	if (ret)
> > > > > >  		goto out_wq;
> > > > > >  
> > > > > > diff --git a/net/vmw_vsock/virtio_transport_common.c
> > > > > > b/net/vmw_vsock/virtio_transport_common.c
> > > > > > index bdf16fff054f..aedb48728677 100644
> > > > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > > @@ -229,7 +229,9 @@
> > > > > > EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> > > > > >  
> > > > > >  static u16 virtio_transport_get_type(struct sock *sk)
> > > > > >  {
> > > > > > -	if (sk->sk_type == SOCK_STREAM)
> > > > > > +	if (sk->sk_type == SOCK_DGRAM)
> > > > > > +		return VIRTIO_VSOCK_TYPE_DGRAM;
> > > > > > +	else if (sk->sk_type == SOCK_STREAM)
> > > > > >  		return VIRTIO_VSOCK_TYPE_STREAM;
> > > > > >  	else
> > > > > >  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> > > > > > @@ -287,22 +289,29 @@ static int
> > > > > > virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > >  	vvs = vsk->trans;
> > > > > >  
> > > > > >  	/* we can send less than pkt_len bytes */
> > > > > > -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > > > > > -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > > > +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> > > > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > > > +			pkt_len =
> > > > > > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > > > +		else
> > > > > > +			return 0;
> > > > > > +	}
> > > > > >  
> > > > > > -	/* virtio_transport_get_credit might return less than
> > > > > > pkt_len credit */
> > > > > > -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > > > +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > > > +		/* virtio_transport_get_credit might return
> > > > > > less than pkt_len credit */
> > > > > > +		pkt_len = virtio_transport_get_credit(vvs,
> > > > > > pkt_len);
> > > > > >  
> > > > > > -	/* Do not send zero length OP_RW pkt */
> > > > > > -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > > -		return pkt_len;
> > > > > > +		/* Do not send zero length OP_RW pkt */
> > > > > > +		if (pkt_len == 0 && info->op ==
> > > > > > VIRTIO_VSOCK_OP_RW)
> > > > > > +			return pkt_len;
> > > > > > +	}
> > > > > >  
> > > > > >  	skb = virtio_transport_alloc_skb(info, pkt_len,
> > > > > >  					 src_cid, src_port,
> > > > > >  					 dst_cid, dst_port,
> > > > > >  					 &err);
> > > > > >  	if (!skb) {
> > > > > > -		virtio_transport_put_credit(vvs, pkt_len);
> > > > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > > > +			virtio_transport_put_credit(vvs,
> > > > > > pkt_len);
> > > > > >  		return err;
> > > > > >  	}
> > > > > >  
> > > > > > @@ -586,6 +595,61 @@
> > > > > > virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
> > > > > >  
> > > > > > +static ssize_t
> > > > > > +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
> > > > > > +				  struct msghdr *msg, size_t
> > > > > > len)
> > > > > > +{
> > > > > > +	struct virtio_vsock_sock *vvs = vsk->trans;
> > > > > > +	struct sk_buff *skb;
> > > > > > +	size_t total = 0;
> > > > > > +	u32 free_space;
> > > > > > +	int err = -EFAULT;
> > > > > > +
> > > > > > +	spin_lock_bh(&vvs->rx_lock);
> > > > > > +	if (total < len && !skb_queue_empty_lockless(&vvs-
> > > > > > >rx_queue)) {
> > > > > > +		skb = __skb_dequeue(&vvs->rx_queue);
> > > > > > +
> > > > > > +		total = len;
> > > > > > +		if (total > skb->len - vsock_metadata(skb)-
> > > > > > >off)
> > > > > > +			total = skb->len - vsock_metadata(skb)-
> > > > > > >off;
> > > > > > +		else if (total < skb->len -
> > > > > > vsock_metadata(skb)->off)
> > > > > > +			msg->msg_flags |= MSG_TRUNC;
> > > > > > +
> > > > > > +		/* sk_lock is held by caller so no one else can
> > > > > > dequeue.
> > > > > > +		 * Unlock rx_lock since memcpy_to_msg() may
> > > > > > sleep.
> > > > > > +		 */
> > > > > > +		spin_unlock_bh(&vvs->rx_lock);
> > > > > > +
> > > > > > +		err = memcpy_to_msg(msg, skb->data +
> > > > > > vsock_metadata(skb)->off, total);
> > > > > > +		if (err)
> > > > > > +			return err;
> > > > > > +
> > > > > > +		spin_lock_bh(&vvs->rx_lock);
> > > > > > +
> > > > > > +		virtio_transport_dec_rx_pkt(vvs, skb);
> > > > > > +		consume_skb(skb);
> > > > > > +	}
> > > > > > +
> > > > > > +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs-
> > > > > > >last_fwd_cnt);
> > > > > > +
> > > > > > +	spin_unlock_bh(&vvs->rx_lock);
> > > > > > +
> > > > > > +	if (total > 0 && msg->msg_name) {
> > > > > > +		/* Provide the address of the sender. */
> > > > > > +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr,
> > > > > > msg->msg_name);
> > > > > > +
> > > > > > +		vsock_addr_init(vm_addr,
> > > > > > le64_to_cpu(vsock_hdr(skb)->src_cid),
> > > > > > +				le32_to_cpu(vsock_hdr(skb)-
> > > > > > >src_port));
> > > > > > +		msg->msg_namelen = sizeof(*vm_addr);
> > > > > > +	}
> > > > > > +	return total;
> > > > > > +}
> > > > > > +
> > > > > > +static s64 virtio_transport_dgram_has_data(struct vsock_sock
> > > > > > *vsk)
> > > > > > +{
> > > > > > +	return virtio_transport_stream_has_data(vsk);
> > > > > > +}
> > > > > > +
> > > > > >  int
> > > > > >  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > > > > >  				   struct msghdr *msg,
> > > > > > @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct
> > > > > > vsock_sock *vsk,
> > > > > >  			       struct msghdr *msg,
> > > > > >  			       size_t len, int flags)
> > > > > >  {
> > > > > > -	return -EOPNOTSUPP;
> > > > > > +	struct sock *sk;
> > > > > > +	size_t err = 0;
> > > > > > +	long timeout;
> > > > > > +
> > > > > > +	DEFINE_WAIT(wait);
> > > > > > +
> > > > > > +	sk = &vsk->sk;
> > > > > > +	err = 0;
> > > > > > +
> > > > > > +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags &
> > > > > > MSG_PEEK)
> > > > > > +		return -EOPNOTSUPP;
> > > > > > +
> > > > > > +	lock_sock(sk);
> > > > > > +
> > > > > > +	if (!len)
> > > > > > +		goto out;
> > > > > > +
> > > > > > +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> > > > > > +
> > > > > > +	while (1) {
> > > > > > +		s64 ready;
> > > > > > +
> > > > > > +		prepare_to_wait(sk_sleep(sk), &wait,
> > > > > > TASK_INTERRUPTIBLE);
> > > > > > +		ready = virtio_transport_dgram_has_data(vsk);
> > > > > > +
> > > > > > +		if (ready == 0) {
> > > > > > +			if (timeout == 0) {
> > > > > > +				err = -EAGAIN;
> > > > > > +				finish_wait(sk_sleep(sk),
> > > > > > &wait);
> > > > > > +				break;
> > > > > > +			}
> > > > > > +
> > > > > > +			release_sock(sk);
> > > > > > +			timeout = schedule_timeout(timeout);
> > > > > > +			lock_sock(sk);
> > > > > > +
> > > > > > +			if (signal_pending(current)) {
> > > > > > +				err = sock_intr_errno(timeout);
> > > > > > +				finish_wait(sk_sleep(sk),
> > > > > > &wait);
> > > > > > +				break;
> > > > > > +			} else if (timeout == 0) {
> > > > > > +				err = -EAGAIN;
> > > > > > +				finish_wait(sk_sleep(sk),
> > > > > > &wait);
> > > > > > +				break;
> > > > > > +			}
> > > > > > +		} else {
> > > > > > +			finish_wait(sk_sleep(sk), &wait);
> > > > > > +
> > > > > > +			if (ready < 0) {
> > > > > > +				err = -ENOMEM;
> > > > > > +				goto out;
> > > > > > +			}
> > > > > > +
> > > > > > +			err =
> > > > > > virtio_transport_dgram_do_dequeue(vsk, msg, len);
> > > > > > +			break;
> > > > > > +		}
> > > > > > +	}
> > > > > > +out:
> > > > > > +	release_sock(sk);
> > > > > > +	return err;
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> > > ^^^
> > > May be, this generic data waiting logic should be in af_vsock.c, as
> > > for stream/seqpacket?
> > > In this way, another transport which supports SOCK_DGRAM could
> > > reuse it.
> > 
> > I think that is a great idea. I'll test that change for v2.
> > 
> > Thanks.
> 
> Also for v2, i tested Your patchset a little bit(write here to not
> spread over all mails):
> 1) seqpacket test in vsock_test.c fails(seems MSG_EOR flag issue)

I will investigate.

> 2) i can't do rmmod with the following config(after testing):
>    CONFIG_VSOCKETS=m
>    CONFIG_VIRTIO_VSOCKETS=m
>    CONFIG_VIRTIO_VSOCKETS_COMMON=m
>    CONFIG_VHOST=m
>    CONFIG_VHOST_VSOCK=m
>    Guest is shutdown, but rmmod fails.
> 3) virtio_transport_init + virtio_transport_exit seems must be
>    under EXPORT_SYMBOL_GPL(), because both used in another module.

Definitely, will fix.

> 4) I tried to send 5kb(or 20kb not matter) piece of data, but got      
>    kernel panic both in guest and later in host.
> 

Thanks for catching that. I can reproduce it intermittently, but only
for seqpacket. Did you happen to see this for other socket types as
well?

Thanks

> Thank You
> > 
> > > > > >  
> > > > > > @@ -819,13 +942,13 @@
> > > > > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > > > > >  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > > > >  				struct sockaddr_vm *addr)
> > > > > >  {
> > > > > > -	return -EOPNOTSUPP;
> > > > > > +	return vsock_bind_stream(vsk, addr);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > > > >  
> > > > > >  bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > > > >  {
> > > > > > -	return false;
> > > > > > +	return true;
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> > > > > >  
> > > > > > @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct
> > > > > > vsock_sock *vsk,
> > > > > >  			       struct msghdr *msg,
> > > > > >  			       size_t dgram_len)
> > > > > >  {
> > > > > > -	return -EOPNOTSUPP;
> > > > > > +	struct virtio_vsock_pkt_info info = {
> > > > > > +		.op = VIRTIO_VSOCK_OP_RW,
> > > > > > +		.msg = msg,
> > > > > > +		.pkt_len = dgram_len,
> > > > > > +		.vsk = vsk,
> > > > > > +		.remote_cid = remote_addr->svm_cid,
> > > > > > +		.remote_port = remote_addr->svm_port,
> > > > > > +	};
> > > > > > +
> > > > > > +	return virtio_transport_send_pkt_info(vsk, &info);
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> > > > > >  
> > > > > > @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct
> > > > > > sock *sk,
> > > > > >  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> > > > > >  	int err = 0;
> > > > > >  
> > > > > > +	if (le16_to_cpu(vsock_hdr(skb)->type) ==
> > > > > > VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > > > +		virtio_transport_recv_enqueue(vsk, skb);
> > > > > > +		sk->sk_data_ready(sk);
> > > > > > +		return err;
> > > > > > +	}
> > > > > > +
> > > > > >  	switch (le16_to_cpu(hdr->op)) {
> > > > > >  	case VIRTIO_VSOCK_OP_RW:
> > > > > >  		virtio_transport_recv_enqueue(vsk, skb);
> > > > > > @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct
> > > > > > sock *sk, struct sk_buff *skb,
> > > > > >  static bool virtio_transport_valid_type(u16 type)
> > > > > >  {
> > > > > >  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> > > > > > -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> > > > > > +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> > > > > > +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
> > > > > >  }
> > > > > >  
> > > > > >  /* We are under the virtio-vsock's vsock->rx_lock or vhost-
> > > > > > vsock's vq->mutex
> > > > > > @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct
> > > > > > virtio_transport *t,
> > > > > >  		goto free_pkt;
> > > > > >  	}
> > > > > >  
> > > > > > +	if (sk->sk_type == SOCK_DGRAM) {
> > > > > > +		virtio_transport_recv_connected(sk, skb);
> > > > > > +		goto out;
> > > > > > +	}
> > > > > > +
> > > > > >  	space_available = virtio_transport_space_update(sk,
> > > > > > skb);
> > > > > >  
> > > > > >  	/* Update CID in case it has changed after a transport
> > > > > > reset event */
> > > > > > @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct
> > > > > > virtio_transport *t,
> > > > > >  		break;
> > > > > >  	}
> > > > > >  
> > > > > > +out:
> > > > > >  	release_sock(sk);
> > > > > >  
> > > > > >  	/* Release refcnt obtained when we fetched this socket
> > > > > > out of the
> > > > > > -- 
> > > > > > 2.35.1
> > > > > > 
> > > > > 
> > > > > -------------------------------------------------------------
> > > > > --------
> > > > > To unsubscribe, e-mail: 
> > > > > virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: 
> > > > > virtio-dev-help@lists.oasis-open.org
> > > > > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-16  2:32   ` [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram Bobby Eshleman
@ 2022-08-17  5:01     ` Arseniy Krasnov
  2022-08-16  9:57       ` Bobby Eshleman
  2022-08-17  5:42       ` Arseniy Krasnov
  0 siblings, 2 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2022-08-17  5:01 UTC (permalink / raw)
  To: Bobby Eshleman, Bobby Eshleman
  Cc: virtio-dev@lists.oasis-open.org, Bobby Eshleman, Cong Wang,
	Jiang Wang, Stefan Hajnoczi, Stefano Garzarella,
	Michael S. Tsirkin, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 16.08.2022 05:32, Bobby Eshleman wrote:
> CC'ing virtio-dev@lists.oasis-open.org
> 
> On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
>> This patch supports dgram in virtio and on the vhost side.
Hello,

sorry, i don't understand, how this maintains message boundaries? Or it
is unnecessary for SOCK_DGRAM?

Thanks
>>
>> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> ---
>>  drivers/vhost/vsock.c                   |   2 +-
>>  include/net/af_vsock.h                  |   2 +
>>  include/uapi/linux/virtio_vsock.h       |   1 +
>>  net/vmw_vsock/af_vsock.c                |  26 +++-
>>  net/vmw_vsock/virtio_transport.c        |   2 +-
>>  net/vmw_vsock/virtio_transport_common.c | 173 ++++++++++++++++++++++--
>>  6 files changed, 186 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> index a5d1bdb786fe..3dc72a5647ca 100644
>> --- a/drivers/vhost/vsock.c
>> +++ b/drivers/vhost/vsock.c
>> @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
>>  	int ret;
>>  
>>  	ret = vsock_core_register(&vhost_transport.transport,
>> -				  VSOCK_TRANSPORT_F_H2G);
>> +				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
>>  	if (ret < 0)
>>  		return ret;
>>  
>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>> index 1c53c4c4d88f..37e55c81e4df 100644
>> --- a/include/net/af_vsock.h
>> +++ b/include/net/af_vsock.h
>> @@ -78,6 +78,8 @@ struct vsock_sock {
>>  s64 vsock_stream_has_data(struct vsock_sock *vsk);
>>  s64 vsock_stream_has_space(struct vsock_sock *vsk);
>>  struct sock *vsock_create_connected(struct sock *parent);
>> +int vsock_bind_stream(struct vsock_sock *vsk,
>> +		      struct sockaddr_vm *addr);
>>  
>>  /**** TRANSPORT ****/
>>  
>> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>> index 857df3a3a70d..0975b9c88292 100644
>> --- a/include/uapi/linux/virtio_vsock.h
>> +++ b/include/uapi/linux/virtio_vsock.h
>> @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
>>  enum virtio_vsock_type {
>>  	VIRTIO_VSOCK_TYPE_STREAM = 1,
>>  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
>> +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
>>  };
>>  
>>  enum virtio_vsock_op {
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index 1893f8aafa48..87e4ae1866d3 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>>  	return 0;
>>  }
>>  
>> +int vsock_bind_stream(struct vsock_sock *vsk,
>> +		      struct sockaddr_vm *addr)
>> +{
>> +	int retval;
>> +
>> +	spin_lock_bh(&vsock_table_lock);
>> +	retval = __vsock_bind_connectible(vsk, addr);
>> +	spin_unlock_bh(&vsock_table_lock);
>> +
>> +	return retval;
>> +}
>> +EXPORT_SYMBOL(vsock_bind_stream);
>> +
>>  static int __vsock_bind_dgram(struct vsock_sock *vsk,
>>  			      struct sockaddr_vm *addr)
>>  {
>> @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct vsock_transport *t, int features)
>>  	}
>>  
>>  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
>> -		if (t_dgram) {
>> -			err = -EBUSY;
>> -			goto err_busy;
>> +		/* TODO: always chose the G2H variant over others, support nesting later */
>> +		if (features & VSOCK_TRANSPORT_F_G2H) {
>> +			if (t_dgram)
>> +				pr_warn("virtio_vsock: t_dgram already set\n");
>> +			t_dgram = t;
>> +		}
>> +
>> +		if (!t_dgram) {
>> +			t_dgram = t;
>>  		}
>> -		t_dgram = t;
>>  	}
>>  
>>  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>> index 073314312683..d4526ca462d2 100644
>> --- a/net/vmw_vsock/virtio_transport.c
>> +++ b/net/vmw_vsock/virtio_transport.c
>> @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
>>  		return -ENOMEM;
>>  
>>  	ret = vsock_core_register(&virtio_transport.transport,
>> -				  VSOCK_TRANSPORT_F_G2H);
>> +				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
>>  	if (ret)
>>  		goto out_wq;
>>  
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index bdf16fff054f..aedb48728677 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -229,7 +229,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>>  
>>  static u16 virtio_transport_get_type(struct sock *sk)
>>  {
>> -	if (sk->sk_type == SOCK_STREAM)
>> +	if (sk->sk_type == SOCK_DGRAM)
>> +		return VIRTIO_VSOCK_TYPE_DGRAM;
>> +	else if (sk->sk_type == SOCK_STREAM)
>>  		return VIRTIO_VSOCK_TYPE_STREAM;
>>  	else
>>  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
>> @@ -287,22 +289,29 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>  	vvs = vsk->trans;
>>  
>>  	/* we can send less than pkt_len bytes */
>> -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>> -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
>> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
>> +			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>> +		else
>> +			return 0;
>> +	}
>>  
>> -	/* virtio_transport_get_credit might return less than pkt_len credit */
>> -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>> +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
>> +		/* virtio_transport_get_credit might return less than pkt_len credit */
>> +		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>>  
>> -	/* Do not send zero length OP_RW pkt */
>> -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>> -		return pkt_len;
>> +		/* Do not send zero length OP_RW pkt */
>> +		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>> +			return pkt_len;
>> +	}
>>  
>>  	skb = virtio_transport_alloc_skb(info, pkt_len,
>>  					 src_cid, src_port,
>>  					 dst_cid, dst_port,
>>  					 &err);
>>  	if (!skb) {
>> -		virtio_transport_put_credit(vvs, pkt_len);
>> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
>> +			virtio_transport_put_credit(vvs, pkt_len);
>>  		return err;
>>  	}
>>  
>> @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
>>  
>> +static ssize_t
>> +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
>> +				  struct msghdr *msg, size_t len)
>> +{
>> +	struct virtio_vsock_sock *vvs = vsk->trans;
>> +	struct sk_buff *skb;
>> +	size_t total = 0;
>> +	u32 free_space;
>> +	int err = -EFAULT;
>> +
>> +	spin_lock_bh(&vvs->rx_lock);
>> +	if (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
>> +		skb = __skb_dequeue(&vvs->rx_queue);
>> +
>> +		total = len;
>> +		if (total > skb->len - vsock_metadata(skb)->off)
>> +			total = skb->len - vsock_metadata(skb)->off;
>> +		else if (total < skb->len - vsock_metadata(skb)->off)
>> +			msg->msg_flags |= MSG_TRUNC;
>> +
>> +		/* sk_lock is held by caller so no one else can dequeue.
>> +		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>> +		 */
>> +		spin_unlock_bh(&vvs->rx_lock);
>> +
>> +		err = memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, total);
>> +		if (err)
>> +			return err;
>> +
>> +		spin_lock_bh(&vvs->rx_lock);
>> +
>> +		virtio_transport_dec_rx_pkt(vvs, skb);
>> +		consume_skb(skb);
>> +	}
>> +
>> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>> +
>> +	spin_unlock_bh(&vvs->rx_lock);
>> +
>> +	if (total > 0 && msg->msg_name) {
>> +		/* Provide the address of the sender. */
>> +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>> +
>> +		vsock_addr_init(vm_addr, le64_to_cpu(vsock_hdr(skb)->src_cid),
>> +				le32_to_cpu(vsock_hdr(skb)->src_port));
>> +		msg->msg_namelen = sizeof(*vm_addr);
>> +	}
>> +	return total;
>> +}
>> +
>> +static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
>> +{
>> +	return virtio_transport_stream_has_data(vsk);
>> +}
>> +
>>  int
>>  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
>>  				   struct msghdr *msg,
>> @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>>  			       struct msghdr *msg,
>>  			       size_t len, int flags)
>>  {
>> -	return -EOPNOTSUPP;
>> +	struct sock *sk;
>> +	size_t err = 0;
>> +	long timeout;
>> +
>> +	DEFINE_WAIT(wait);
>> +
>> +	sk = &vsk->sk;
>> +	err = 0;
>> +
>> +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
>> +		return -EOPNOTSUPP;
>> +
>> +	lock_sock(sk);
>> +
>> +	if (!len)
>> +		goto out;
>> +
>> +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>> +
>> +	while (1) {
>> +		s64 ready;
>> +
>> +		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>> +		ready = virtio_transport_dgram_has_data(vsk);
>> +
>> +		if (ready == 0) {
>> +			if (timeout == 0) {
>> +				err = -EAGAIN;
>> +				finish_wait(sk_sleep(sk), &wait);
>> +				break;
>> +			}
>> +
>> +			release_sock(sk);
>> +			timeout = schedule_timeout(timeout);
>> +			lock_sock(sk);
>> +
>> +			if (signal_pending(current)) {
>> +				err = sock_intr_errno(timeout);
>> +				finish_wait(sk_sleep(sk), &wait);
>> +				break;
>> +			} else if (timeout == 0) {
>> +				err = -EAGAIN;
>> +				finish_wait(sk_sleep(sk), &wait);
>> +				break;
>> +			}
>> +		} else {
>> +			finish_wait(sk_sleep(sk), &wait);
>> +
>> +			if (ready < 0) {
>> +				err = -ENOMEM;
>> +				goto out;
>> +			}
>> +
>> +			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
>> +			break;
>> +		}
>> +	}
>> +out:
>> +	release_sock(sk);
>> +	return err;
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
>>  
>> @@ -819,13 +942,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>>  				struct sockaddr_vm *addr)
>>  {
>> -	return -EOPNOTSUPP;
>> +	return vsock_bind_stream(vsk, addr);
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>>  
>>  bool virtio_transport_dgram_allow(u32 cid, u32 port)
>>  {
>> -	return false;
>> +	return true;
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>>  
>> @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
>>  			       struct msghdr *msg,
>>  			       size_t dgram_len)
>>  {
>> -	return -EOPNOTSUPP;
>> +	struct virtio_vsock_pkt_info info = {
>> +		.op = VIRTIO_VSOCK_OP_RW,
>> +		.msg = msg,
>> +		.pkt_len = dgram_len,
>> +		.vsk = vsk,
>> +		.remote_cid = remote_addr->svm_cid,
>> +		.remote_port = remote_addr->svm_port,
>> +	};
>> +
>> +	return virtio_transport_send_pkt_info(vsk, &info);
>>  }
>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>>  
>> @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct sock *sk,
>>  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>>  	int err = 0;
>>  
>> +	if (le16_to_cpu(vsock_hdr(skb)->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
>> +		virtio_transport_recv_enqueue(vsk, skb);
>> +		sk->sk_data_ready(sk);
>> +		return err;
>> +	}
>> +
>>  	switch (le16_to_cpu(hdr->op)) {
>>  	case VIRTIO_VSOCK_OP_RW:
>>  		virtio_transport_recv_enqueue(vsk, skb);
>> @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>>  static bool virtio_transport_valid_type(u16 type)
>>  {
>>  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
>> -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
>> +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
>> +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
>>  }
>>  
>>  /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
>> @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>  		goto free_pkt;
>>  	}
>>  
>> +	if (sk->sk_type == SOCK_DGRAM) {
>> +		virtio_transport_recv_connected(sk, skb);
>> +		goto out;
>> +	}
>> +
>>  	space_available = virtio_transport_space_update(sk, skb);
>>  
>>  	/* Update CID in case it has changed after a transport reset event */
>> @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>  		break;
>>  	}
>>  
>> +out:
>>  	release_sock(sk);
>>  
>>  	/* Release refcnt obtained when we fetched this socket out of the
>> -- 
>> 2.35.1
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 2/6] vsock: return errors other than -ENOMEM to socket
  2022-08-16  2:30   ` [virtio-dev] Re: [PATCH 2/6] vsock: return errors other than -ENOMEM to socket Bobby Eshleman
@ 2022-08-17  5:28     ` Arseniy Krasnov
  0 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2022-08-17  5:28 UTC (permalink / raw)
  To: Bobby Eshleman, Bobby Eshleman
  Cc: virtio-dev@lists.oasis-open.org, Bobby Eshleman, Cong Wang,
	Jiang Wang, Stefan Hajnoczi, Stefano Garzarella,
	Michael S. Tsirkin, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Dexuan Cui, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org

On 16.08.2022 05:30, Bobby Eshleman wrote:
> CC'ing virtio-dev@lists.oasis-open.org
> 
> On Mon, Aug 15, 2022 at 10:56:05AM -0700, Bobby Eshleman wrote:
>> This commit allows vsock implementations to return errors
>> to the socket layer other than -ENOMEM. One immediate effect
>> of this is that upon the sk_sndbuf threshold being reached -EAGAIN
>> will be returned and userspace may throttle appropriately.
>>
>> Resultingly, a known issue with uperf is resolved[1].
>>
>> Additionally, to preserve legacy behavior for non-virtio
>> implementations, hyperv/vmci force errors to be -ENOMEM so that behavior
>> is unchanged.
>>
>> [1]: https://gitlab.com/vsock/vsock/-/issues/1
>>
>> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> ---
>>  include/linux/virtio_vsock.h            | 3 +++
>>  net/vmw_vsock/af_vsock.c                | 3 ++-
>>  net/vmw_vsock/hyperv_transport.c        | 2 +-
>>  net/vmw_vsock/virtio_transport_common.c | 3 ---
>>  net/vmw_vsock/vmci_transport.c          | 9 ++++++++-
>>  5 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> index 17ed01466875..9a37eddbb87a 100644
>> --- a/include/linux/virtio_vsock.h
>> +++ b/include/linux/virtio_vsock.h
>> @@ -8,6 +8,9 @@
>>  #include <net/sock.h>
>>  #include <net/af_vsock.h>
>>  
>> +/* Threshold for detecting small packets to copy */
>> +#define GOOD_COPY_LEN  128
>> +
>>  enum virtio_vsock_metadata_flags {
>>  	VIRTIO_VSOCK_METADATA_FLAGS_REPLY		= BIT(0),
>>  	VIRTIO_VSOCK_METADATA_FLAGS_TAP_DELIVERED	= BIT(1),
>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>> index e348b2d09eac..1893f8aafa48 100644
>> --- a/net/vmw_vsock/af_vsock.c
>> +++ b/net/vmw_vsock/af_vsock.c
>> @@ -1844,8 +1844,9 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg,
>>  			written = transport->stream_enqueue(vsk,
>>  					msg, len - total_written);
>>  		}
>> +
>>  		if (written < 0) {
>> -			err = -ENOMEM;
>> +			err = written;
>>  			goto out_err;
>>  		}
IIUC, for stream, this thing will have effect, only one first transport access fails. In this
case 'total_written' will be 0, so 'err' == 'written' will be returned. But when 'total_written > 0',
'err' will be overwritten by 'total_written' below, preserving current behaviour. Is it what You
supposed?

Thanks
>>  
>> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
>> index fd98229e3db3..e99aea571f6f 100644
>> --- a/net/vmw_vsock/hyperv_transport.c
>> +++ b/net/vmw_vsock/hyperv_transport.c
>> @@ -687,7 +687,7 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
>>  	if (bytes_written)
>>  		ret = bytes_written;
>>  	kfree(send_buf);
>> -	return ret;
>> +	return ret < 0 ? -ENOMEM : ret;
>>  }
>>  
>>  static s64 hvs_stream_has_data(struct vsock_sock *vsk)
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 920578597bb9..d5780599fe93 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -23,9 +23,6 @@
>>  /* How long to wait for graceful shutdown of a connection */
>>  #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
>>  
>> -/* Threshold for detecting small packets to copy */
>> -#define GOOD_COPY_LEN  128
>> -
>>  static const struct virtio_transport *
>>  virtio_transport_get_ops(struct vsock_sock *vsk)
>>  {
>> diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
>> index b14f0ed7427b..c927a90dc859 100644
>> --- a/net/vmw_vsock/vmci_transport.c
>> +++ b/net/vmw_vsock/vmci_transport.c
>> @@ -1838,7 +1838,14 @@ static ssize_t vmci_transport_stream_enqueue(
>>  	struct msghdr *msg,
>>  	size_t len)
>>  {
>> -	return vmci_qpair_enquev(vmci_trans(vsk)->qpair, msg, len, 0);
>> +	int err;
>> +
>> +	err = vmci_qpair_enquev(vmci_trans(vsk)->qpair, msg, len, 0);
>> +
>> +	if (err < 0)
>> +		err = -ENOMEM;
>> +
>> +	return err;
>>  }
>>  
>>  static s64 vmci_transport_stream_has_data(struct vsock_sock *vsk)
>> -- 
>> 2.35.1
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-17  5:01     ` Arseniy Krasnov
  2022-08-16  9:57       ` Bobby Eshleman
@ 2022-08-17  5:42       ` Arseniy Krasnov
  2022-08-16  9:58         ` Bobby Eshleman
  1 sibling, 1 reply; 16+ messages in thread
From: Arseniy Krasnov @ 2022-08-17  5:42 UTC (permalink / raw)
  To: Bobby Eshleman, Bobby Eshleman
  Cc: virtio-dev@lists.oasis-open.org, Bobby Eshleman, Cong Wang,
	Jiang Wang, Stefan Hajnoczi, Stefano Garzarella,
	Michael S. Tsirkin, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 17.08.2022 08:01, Arseniy Krasnov wrote:
> On 16.08.2022 05:32, Bobby Eshleman wrote:
>> CC'ing virtio-dev@lists.oasis-open.org
>>
>> On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
>>> This patch supports dgram in virtio and on the vhost side.
> Hello,
> 
> sorry, i don't understand, how this maintains message boundaries? Or it
> is unnecessary for SOCK_DGRAM?
> 
> Thanks
>>>
>>> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
>>> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>>> ---
>>>  drivers/vhost/vsock.c                   |   2 +-
>>>  include/net/af_vsock.h                  |   2 +
>>>  include/uapi/linux/virtio_vsock.h       |   1 +
>>>  net/vmw_vsock/af_vsock.c                |  26 +++-
>>>  net/vmw_vsock/virtio_transport.c        |   2 +-
>>>  net/vmw_vsock/virtio_transport_common.c | 173 ++++++++++++++++++++++--
>>>  6 files changed, 186 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>>> index a5d1bdb786fe..3dc72a5647ca 100644
>>> --- a/drivers/vhost/vsock.c
>>> +++ b/drivers/vhost/vsock.c
>>> @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
>>>  	int ret;
>>>  
>>>  	ret = vsock_core_register(&vhost_transport.transport,
>>> -				  VSOCK_TRANSPORT_F_H2G);
>>> +				  VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM);
>>>  	if (ret < 0)
>>>  		return ret;
>>>  
>>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>>> index 1c53c4c4d88f..37e55c81e4df 100644
>>> --- a/include/net/af_vsock.h
>>> +++ b/include/net/af_vsock.h
>>> @@ -78,6 +78,8 @@ struct vsock_sock {
>>>  s64 vsock_stream_has_data(struct vsock_sock *vsk);
>>>  s64 vsock_stream_has_space(struct vsock_sock *vsk);
>>>  struct sock *vsock_create_connected(struct sock *parent);
>>> +int vsock_bind_stream(struct vsock_sock *vsk,
>>> +		      struct sockaddr_vm *addr);
>>>  
>>>  /**** TRANSPORT ****/
>>>  
>>> diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h
>>> index 857df3a3a70d..0975b9c88292 100644
>>> --- a/include/uapi/linux/virtio_vsock.h
>>> +++ b/include/uapi/linux/virtio_vsock.h
>>> @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
>>>  enum virtio_vsock_type {
>>>  	VIRTIO_VSOCK_TYPE_STREAM = 1,
>>>  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
>>> +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
>>>  };
>>>  
>>>  enum virtio_vsock_op {
>>> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>>> index 1893f8aafa48..87e4ae1866d3 100644
>>> --- a/net/vmw_vsock/af_vsock.c
>>> +++ b/net/vmw_vsock/af_vsock.c
>>> @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
>>>  	return 0;
>>>  }
>>>  
>>> +int vsock_bind_stream(struct vsock_sock *vsk,
>>> +		      struct sockaddr_vm *addr)
>>> +{
>>> +	int retval;
>>> +
>>> +	spin_lock_bh(&vsock_table_lock);
>>> +	retval = __vsock_bind_connectible(vsk, addr);
>>> +	spin_unlock_bh(&vsock_table_lock);
>>> +
>>> +	return retval;
>>> +}
>>> +EXPORT_SYMBOL(vsock_bind_stream);
>>> +
>>>  static int __vsock_bind_dgram(struct vsock_sock *vsk,
>>>  			      struct sockaddr_vm *addr)
>>>  {
>>> @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct vsock_transport *t, int features)
>>>  	}
>>>  
>>>  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
>>> -		if (t_dgram) {
>>> -			err = -EBUSY;
>>> -			goto err_busy;
>>> +		/* TODO: always chose the G2H variant over others, support nesting later */
>>> +		if (features & VSOCK_TRANSPORT_F_G2H) {
>>> +			if (t_dgram)
>>> +				pr_warn("virtio_vsock: t_dgram already set\n");
>>> +			t_dgram = t;
>>> +		}
>>> +
>>> +		if (!t_dgram) {
>>> +			t_dgram = t;
>>>  		}
>>> -		t_dgram = t;
>>>  	}
>>>  
>>>  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
>>> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>>> index 073314312683..d4526ca462d2 100644
>>> --- a/net/vmw_vsock/virtio_transport.c
>>> +++ b/net/vmw_vsock/virtio_transport.c
>>> @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
>>>  		return -ENOMEM;
>>>  
>>>  	ret = vsock_core_register(&virtio_transport.transport,
>>> -				  VSOCK_TRANSPORT_F_G2H);
>>> +				  VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM);
>>>  	if (ret)
>>>  		goto out_wq;
>>>  
>>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> index bdf16fff054f..aedb48728677 100644
>>> --- a/net/vmw_vsock/virtio_transport_common.c
>>> +++ b/net/vmw_vsock/virtio_transport_common.c
>>> @@ -229,7 +229,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
>>>  
>>>  static u16 virtio_transport_get_type(struct sock *sk)
>>>  {
>>> -	if (sk->sk_type == SOCK_STREAM)
>>> +	if (sk->sk_type == SOCK_DGRAM)
>>> +		return VIRTIO_VSOCK_TYPE_DGRAM;
>>> +	else if (sk->sk_type == SOCK_STREAM)
>>>  		return VIRTIO_VSOCK_TYPE_STREAM;
>>>  	else
>>>  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
>>> @@ -287,22 +289,29 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>>  	vvs = vsk->trans;
>>>  
>>>  	/* we can send less than pkt_len bytes */
>>> -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
>>> -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>>> +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
>>> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
>>> +			pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
>>> +		else
>>> +			return 0;
>>> +	}
>>>  
>>> -	/* virtio_transport_get_credit might return less than pkt_len credit */
>>> -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>>> +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
>>> +		/* virtio_transport_get_credit might return less than pkt_len credit */
>>> +		pkt_len = virtio_transport_get_credit(vvs, pkt_len);
>>>  
>>> -	/* Do not send zero length OP_RW pkt */
>>> -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>>> -		return pkt_len;
>>> +		/* Do not send zero length OP_RW pkt */
>>> +		if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
>>> +			return pkt_len;
>>> +	}
>>>  
>>>  	skb = virtio_transport_alloc_skb(info, pkt_len,
>>>  					 src_cid, src_port,
>>>  					 dst_cid, dst_port,
>>>  					 &err);
>>>  	if (!skb) {
>>> -		virtio_transport_put_credit(vvs, pkt_len);
>>> +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
>>> +			virtio_transport_put_credit(vvs, pkt_len);
>>>  		return err;
>>>  	}
>>>  
>>> @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
>>>  
>>> +static ssize_t
>>> +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
>>> +				  struct msghdr *msg, size_t len)
>>> +{
>>> +	struct virtio_vsock_sock *vvs = vsk->trans;
>>> +	struct sk_buff *skb;
>>> +	size_t total = 0;
>>> +	u32 free_space;
>>> +	int err = -EFAULT;
>>> +
>>> +	spin_lock_bh(&vvs->rx_lock);
>>> +	if (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
>>> +		skb = __skb_dequeue(&vvs->rx_queue);
>>> +
>>> +		total = len;
>>> +		if (total > skb->len - vsock_metadata(skb)->off)
>>> +			total = skb->len - vsock_metadata(skb)->off;
>>> +		else if (total < skb->len - vsock_metadata(skb)->off)
>>> +			msg->msg_flags |= MSG_TRUNC;
>>> +
>>> +		/* sk_lock is held by caller so no one else can dequeue.
>>> +		 * Unlock rx_lock since memcpy_to_msg() may sleep.
>>> +		 */
>>> +		spin_unlock_bh(&vvs->rx_lock);
>>> +
>>> +		err = memcpy_to_msg(msg, skb->data + vsock_metadata(skb)->off, total);
>>> +		if (err)
>>> +			return err;
>>> +
>>> +		spin_lock_bh(&vvs->rx_lock);
>>> +
>>> +		virtio_transport_dec_rx_pkt(vvs, skb);
>>> +		consume_skb(skb);
>>> +	}
>>> +
>>> +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
>>> +
>>> +	spin_unlock_bh(&vvs->rx_lock);
>>> +
>>> +	if (total > 0 && msg->msg_name) {
>>> +		/* Provide the address of the sender. */
>>> +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name);
>>> +
>>> +		vsock_addr_init(vm_addr, le64_to_cpu(vsock_hdr(skb)->src_cid),
>>> +				le32_to_cpu(vsock_hdr(skb)->src_port));
>>> +		msg->msg_namelen = sizeof(*vm_addr);
>>> +	}
>>> +	return total;
>>> +}
>>> +
>>> +static s64 virtio_transport_dgram_has_data(struct vsock_sock *vsk)
>>> +{
>>> +	return virtio_transport_stream_has_data(vsk);
>>> +}
>>> +
>>>  int
>>>  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
>>>  				   struct msghdr *msg,
>>> @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk,
>>>  			       struct msghdr *msg,
>>>  			       size_t len, int flags)
>>>  {
>>> -	return -EOPNOTSUPP;
>>> +	struct sock *sk;
>>> +	size_t err = 0;
>>> +	long timeout;
>>> +
>>> +	DEFINE_WAIT(wait);
>>> +
>>> +	sk = &vsk->sk;
>>> +	err = 0;
>>> +
>>> +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags & MSG_PEEK)
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	lock_sock(sk);
>>> +
>>> +	if (!len)
>>> +		goto out;
>>> +
>>> +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>>> +
>>> +	while (1) {
>>> +		s64 ready;
>>> +
>>> +		prepare_to_wait(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE);
>>> +		ready = virtio_transport_dgram_has_data(vsk);
>>> +
>>> +		if (ready == 0) {
>>> +			if (timeout == 0) {
>>> +				err = -EAGAIN;
>>> +				finish_wait(sk_sleep(sk), &wait);
>>> +				break;
>>> +			}
>>> +
>>> +			release_sock(sk);
>>> +			timeout = schedule_timeout(timeout);
>>> +			lock_sock(sk);
>>> +
>>> +			if (signal_pending(current)) {
>>> +				err = sock_intr_errno(timeout);
>>> +				finish_wait(sk_sleep(sk), &wait);
>>> +				break;
>>> +			} else if (timeout == 0) {
>>> +				err = -EAGAIN;
>>> +				finish_wait(sk_sleep(sk), &wait);
>>> +				break;
>>> +			}
>>> +		} else {
>>> +			finish_wait(sk_sleep(sk), &wait);
>>> +
>>> +			if (ready < 0) {
>>> +				err = -ENOMEM;
>>> +				goto out;
>>> +			}
>>> +
>>> +			err = virtio_transport_dgram_do_dequeue(vsk, msg, len);
>>> +			break;
>>> +		}
>>> +	}
>>> +out:
>>> +	release_sock(sk);
>>> +	return err;
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
^^^
May be, this generic data waiting logic should be in af_vsock.c, as for stream/seqpacket?
In this way, another transport which supports SOCK_DGRAM could reuse it.
>>>  
>>> @@ -819,13 +942,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
>>>  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
>>>  				struct sockaddr_vm *addr)
>>>  {
>>> -	return -EOPNOTSUPP;
>>> +	return vsock_bind_stream(vsk, addr);
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
>>>  
>>>  bool virtio_transport_dgram_allow(u32 cid, u32 port)
>>>  {
>>> -	return false;
>>> +	return true;
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
>>>  
>>> @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk,
>>>  			       struct msghdr *msg,
>>>  			       size_t dgram_len)
>>>  {
>>> -	return -EOPNOTSUPP;
>>> +	struct virtio_vsock_pkt_info info = {
>>> +		.op = VIRTIO_VSOCK_OP_RW,
>>> +		.msg = msg,
>>> +		.pkt_len = dgram_len,
>>> +		.vsk = vsk,
>>> +		.remote_cid = remote_addr->svm_cid,
>>> +		.remote_port = remote_addr->svm_port,
>>> +	};
>>> +
>>> +	return virtio_transport_send_pkt_info(vsk, &info);
>>>  }
>>>  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
>>>  
>>> @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct sock *sk,
>>>  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
>>>  	int err = 0;
>>>  
>>> +	if (le16_to_cpu(vsock_hdr(skb)->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
>>> +		virtio_transport_recv_enqueue(vsk, skb);
>>> +		sk->sk_data_ready(sk);
>>> +		return err;
>>> +	}
>>> +
>>>  	switch (le16_to_cpu(hdr->op)) {
>>>  	case VIRTIO_VSOCK_OP_RW:
>>>  		virtio_transport_recv_enqueue(vsk, skb);
>>> @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb,
>>>  static bool virtio_transport_valid_type(u16 type)
>>>  {
>>>  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
>>> -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
>>> +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
>>> +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
>>>  }
>>>  
>>>  /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex
>>> @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>>  		goto free_pkt;
>>>  	}
>>>  
>>> +	if (sk->sk_type == SOCK_DGRAM) {
>>> +		virtio_transport_recv_connected(sk, skb);
>>> +		goto out;
>>> +	}
>>> +
>>>  	space_available = virtio_transport_space_update(sk, skb);
>>>  
>>>  	/* Update CID in case it has changed after a transport reset event */
>>> @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t,
>>>  		break;
>>>  	}
>>>  
>>> +out:
>>>  	release_sock(sk);
>>>  
>>>  	/* Release refcnt obtained when we fetched this socket out of the
>>> -- 
>>> 2.35.1
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-16  9:57       ` Bobby Eshleman
@ 2022-08-18  8:24         ` Arseniy Krasnov
  0 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2022-08-18  8:24 UTC (permalink / raw)
  To: bobbyeshleman@gmail.com
  Cc: kvm@vger.kernel.org, jasowang@redhat.com,
	bobby.eshleman@gmail.com, davem@davemloft.net,
	virtio-dev@lists.oasis-open.org, stefanha@redhat.com,
	bobby.eshleman@bytedance.com, linux-kernel@vger.kernel.org,
	pabeni@redhat.com, edumazet@google.com, jiang.wang@bytedance.com,
	sgarzare@redhat.com, kuba@kernel.org, cong.wang@bytedance.com,
	netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	mst@redhat.com

On Tue, 2022-08-16 at 09:57 +0000, Bobby Eshleman wrote:
> On Wed, Aug 17, 2022 at 05:01:00AM +0000, Arseniy Krasnov wrote:
> > On 16.08.2022 05:32, Bobby Eshleman wrote:
> > > CC'ing virtio-dev@lists.oasis-open.org
> > > 
> > > On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
> > > > This patch supports dgram in virtio and on the vhost side.
> > Hello,
> > 
> > sorry, i don't understand, how this maintains message boundaries?
> > Or it
> > is unnecessary for SOCK_DGRAM?
> > 
> > Thanks
> 
> If I understand your question, the length is included in the header,
> so
> receivers always know that header start + header length + payload
> length
> marks the message boundary.

I mean, consider the following case: host sends 5kb packet to guest.
Guest uses 4kb virtio rx buffers, so in drivers/vhost/vsock.c this 5kb
packet(e.g. its payload) will be placed to 2 virtio rx buffers - 4kb
to first buffer and rest 1kb to second buffer. Is it implemented, that
receiver gets whole 5kb piece of data during single 'read()/recv()'
system call?

Thanks

> 
> > > > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > ---
> > > >  drivers/vhost/vsock.c                   |   2 +-
> > > >  include/net/af_vsock.h                  |   2 +
> > > >  include/uapi/linux/virtio_vsock.h       |   1 +
> > > >  net/vmw_vsock/af_vsock.c                |  26 +++-
> > > >  net/vmw_vsock/virtio_transport.c        |   2 +-
> > > >  net/vmw_vsock/virtio_transport_common.c | 173
> > > > ++++++++++++++++++++++--
> > > >  6 files changed, 186 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > index a5d1bdb786fe..3dc72a5647ca 100644
> > > > --- a/drivers/vhost/vsock.c
> > > > +++ b/drivers/vhost/vsock.c
> > > > @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
> > > >  	int ret;
> > > >  
> > > >  	ret = vsock_core_register(&vhost_transport.transport,
> > > > -				  VSOCK_TRANSPORT_F_H2G);
> > > > +				  VSOCK_TRANSPORT_F_H2G |
> > > > VSOCK_TRANSPORT_F_DGRAM);
> > > >  	if (ret < 0)
> > > >  		return ret;
> > > >  
> > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > index 1c53c4c4d88f..37e55c81e4df 100644
> > > > --- a/include/net/af_vsock.h
> > > > +++ b/include/net/af_vsock.h
> > > > @@ -78,6 +78,8 @@ struct vsock_sock {
> > > >  s64 vsock_stream_has_data(struct vsock_sock *vsk);
> > > >  s64 vsock_stream_has_space(struct vsock_sock *vsk);
> > > >  struct sock *vsock_create_connected(struct sock *parent);
> > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > +		      struct sockaddr_vm *addr);
> > > >  
> > > >  /**** TRANSPORT ****/
> > > >  
> > > > diff --git a/include/uapi/linux/virtio_vsock.h
> > > > b/include/uapi/linux/virtio_vsock.h
> > > > index 857df3a3a70d..0975b9c88292 100644
> > > > --- a/include/uapi/linux/virtio_vsock.h
> > > > +++ b/include/uapi/linux/virtio_vsock.h
> > > > @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> > > >  enum virtio_vsock_type {
> > > >  	VIRTIO_VSOCK_TYPE_STREAM = 1,
> > > >  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> > > > +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> > > >  };
> > > >  
> > > >  enum virtio_vsock_op {
> > > > diff --git a/net/vmw_vsock/af_vsock.c
> > > > b/net/vmw_vsock/af_vsock.c
> > > > index 1893f8aafa48..87e4ae1866d3 100644
> > > > --- a/net/vmw_vsock/af_vsock.c
> > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > @@ -675,6 +675,19 @@ static int __vsock_bind_connectible(struct
> > > > vsock_sock *vsk,
> > > >  	return 0;
> > > >  }
> > > >  
> > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > +		      struct sockaddr_vm *addr)
> > > > +{
> > > > +	int retval;
> > > > +
> > > > +	spin_lock_bh(&vsock_table_lock);
> > > > +	retval = __vsock_bind_connectible(vsk, addr);
> > > > +	spin_unlock_bh(&vsock_table_lock);
> > > > +
> > > > +	return retval;
> > > > +}
> > > > +EXPORT_SYMBOL(vsock_bind_stream);
> > > > +
> > > >  static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > >  			      struct sockaddr_vm *addr)
> > > >  {
> > > > @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct
> > > > vsock_transport *t, int features)
> > > >  	}
> > > >  
> > > >  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > > > -		if (t_dgram) {
> > > > -			err = -EBUSY;
> > > > -			goto err_busy;
> > > > +		/* TODO: always chose the G2H variant over
> > > > others, support nesting later */
> > > > +		if (features & VSOCK_TRANSPORT_F_G2H) {
> > > > +			if (t_dgram)
> > > > +				pr_warn("virtio_vsock: t_dgram
> > > > already set\n");
> > > > +			t_dgram = t;
> > > > +		}
> > > > +
> > > > +		if (!t_dgram) {
> > > > +			t_dgram = t;
> > > >  		}
> > > > -		t_dgram = t;
> > > >  	}
> > > >  
> > > >  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> > > > diff --git a/net/vmw_vsock/virtio_transport.c
> > > > b/net/vmw_vsock/virtio_transport.c
> > > > index 073314312683..d4526ca462d2 100644
> > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
> > > >  		return -ENOMEM;
> > > >  
> > > >  	ret = vsock_core_register(&virtio_transport.transport,
> > > > -				  VSOCK_TRANSPORT_F_G2H);
> > > > +				  VSOCK_TRANSPORT_F_G2H |
> > > > VSOCK_TRANSPORT_F_DGRAM);
> > > >  	if (ret)
> > > >  		goto out_wq;
> > > >  
> > > > diff --git a/net/vmw_vsock/virtio_transport_common.c
> > > > b/net/vmw_vsock/virtio_transport_common.c
> > > > index bdf16fff054f..aedb48728677 100644
> > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > @@ -229,7 +229,9 @@
> > > > EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> > > >  
> > > >  static u16 virtio_transport_get_type(struct sock *sk)
> > > >  {
> > > > -	if (sk->sk_type == SOCK_STREAM)
> > > > +	if (sk->sk_type == SOCK_DGRAM)
> > > > +		return VIRTIO_VSOCK_TYPE_DGRAM;
> > > > +	else if (sk->sk_type == SOCK_STREAM)
> > > >  		return VIRTIO_VSOCK_TYPE_STREAM;
> > > >  	else
> > > >  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> > > > @@ -287,22 +289,29 @@ static int
> > > > virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > >  	vvs = vsk->trans;
> > > >  
> > > >  	/* we can send less than pkt_len bytes */
> > > > -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > > > -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > +			pkt_len =
> > > > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > +		else
> > > > +			return 0;
> > > > +	}
> > > >  
> > > > -	/* virtio_transport_get_credit might return less than
> > > > pkt_len credit */
> > > > -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > +		/* virtio_transport_get_credit might return
> > > > less than pkt_len credit */
> > > > +		pkt_len = virtio_transport_get_credit(vvs,
> > > > pkt_len);
> > > >  
> > > > -	/* Do not send zero length OP_RW pkt */
> > > > -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > -		return pkt_len;
> > > > +		/* Do not send zero length OP_RW pkt */
> > > > +		if (pkt_len == 0 && info->op ==
> > > > VIRTIO_VSOCK_OP_RW)
> > > > +			return pkt_len;
> > > > +	}
> > > >  
> > > >  	skb = virtio_transport_alloc_skb(info, pkt_len,
> > > >  					 src_cid, src_port,
> > > >  					 dst_cid, dst_port,
> > > >  					 &err);
> > > >  	if (!skb) {
> > > > -		virtio_transport_put_credit(vvs, pkt_len);
> > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > +			virtio_transport_put_credit(vvs,
> > > > pkt_len);
> > > >  		return err;
> > > >  	}
> > > >  
> > > > @@ -586,6 +595,61 @@ virtio_transport_seqpacket_dequeue(struct
> > > > vsock_sock *vsk,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
> > > >  
> > > > +static ssize_t
> > > > +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
> > > > +				  struct msghdr *msg, size_t
> > > > len)
> > > > +{
> > > > +	struct virtio_vsock_sock *vvs = vsk->trans;
> > > > +	struct sk_buff *skb;
> > > > +	size_t total = 0;
> > > > +	u32 free_space;
> > > > +	int err = -EFAULT;
> > > > +
> > > > +	spin_lock_bh(&vvs->rx_lock);
> > > > +	if (total < len && !skb_queue_empty_lockless(&vvs-
> > > > >rx_queue)) {
> > > > +		skb = __skb_dequeue(&vvs->rx_queue);
> > > > +
> > > > +		total = len;
> > > > +		if (total > skb->len - vsock_metadata(skb)-
> > > > >off)
> > > > +			total = skb->len - vsock_metadata(skb)-
> > > > >off;
> > > > +		else if (total < skb->len -
> > > > vsock_metadata(skb)->off)
> > > > +			msg->msg_flags |= MSG_TRUNC;
> > > > +
> > > > +		/* sk_lock is held by caller so no one else can
> > > > dequeue.
> > > > +		 * Unlock rx_lock since memcpy_to_msg() may
> > > > sleep.
> > > > +		 */
> > > > +		spin_unlock_bh(&vvs->rx_lock);
> > > > +
> > > > +		err = memcpy_to_msg(msg, skb->data +
> > > > vsock_metadata(skb)->off, total);
> > > > +		if (err)
> > > > +			return err;
> > > > +
> > > > +		spin_lock_bh(&vvs->rx_lock);
> > > > +
> > > > +		virtio_transport_dec_rx_pkt(vvs, skb);
> > > > +		consume_skb(skb);
> > > > +	}
> > > > +
> > > > +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs-
> > > > >last_fwd_cnt);
> > > > +
> > > > +	spin_unlock_bh(&vvs->rx_lock);
> > > > +
> > > > +	if (total > 0 && msg->msg_name) {
> > > > +		/* Provide the address of the sender. */
> > > > +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr,
> > > > msg->msg_name);
> > > > +
> > > > +		vsock_addr_init(vm_addr,
> > > > le64_to_cpu(vsock_hdr(skb)->src_cid),
> > > > +				le32_to_cpu(vsock_hdr(skb)-
> > > > >src_port));
> > > > +		msg->msg_namelen = sizeof(*vm_addr);
> > > > +	}
> > > > +	return total;
> > > > +}
> > > > +
> > > > +static s64 virtio_transport_dgram_has_data(struct vsock_sock
> > > > *vsk)
> > > > +{
> > > > +	return virtio_transport_stream_has_data(vsk);
> > > > +}
> > > > +
> > > >  int
> > > >  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > > >  				   struct msghdr *msg,
> > > > @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct
> > > > vsock_sock *vsk,
> > > >  			       struct msghdr *msg,
> > > >  			       size_t len, int flags)
> > > >  {
> > > > -	return -EOPNOTSUPP;
> > > > +	struct sock *sk;
> > > > +	size_t err = 0;
> > > > +	long timeout;
> > > > +
> > > > +	DEFINE_WAIT(wait);
> > > > +
> > > > +	sk = &vsk->sk;
> > > > +	err = 0;
> > > > +
> > > > +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags &
> > > > MSG_PEEK)
> > > > +		return -EOPNOTSUPP;
> > > > +
> > > > +	lock_sock(sk);
> > > > +
> > > > +	if (!len)
> > > > +		goto out;
> > > > +
> > > > +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> > > > +
> > > > +	while (1) {
> > > > +		s64 ready;
> > > > +
> > > > +		prepare_to_wait(sk_sleep(sk), &wait,
> > > > TASK_INTERRUPTIBLE);
> > > > +		ready = virtio_transport_dgram_has_data(vsk);
> > > > +
> > > > +		if (ready == 0) {
> > > > +			if (timeout == 0) {
> > > > +				err = -EAGAIN;
> > > > +				finish_wait(sk_sleep(sk),
> > > > &wait);
> > > > +				break;
> > > > +			}
> > > > +
> > > > +			release_sock(sk);
> > > > +			timeout = schedule_timeout(timeout);
> > > > +			lock_sock(sk);
> > > > +
> > > > +			if (signal_pending(current)) {
> > > > +				err = sock_intr_errno(timeout);
> > > > +				finish_wait(sk_sleep(sk),
> > > > &wait);
> > > > +				break;
> > > > +			} else if (timeout == 0) {
> > > > +				err = -EAGAIN;
> > > > +				finish_wait(sk_sleep(sk),
> > > > &wait);
> > > > +				break;
> > > > +			}
> > > > +		} else {
> > > > +			finish_wait(sk_sleep(sk), &wait);
> > > > +
> > > > +			if (ready < 0) {
> > > > +				err = -ENOMEM;
> > > > +				goto out;
> > > > +			}
> > > > +
> > > > +			err =
> > > > virtio_transport_dgram_do_dequeue(vsk, msg, len);
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +out:
> > > > +	release_sock(sk);
> > > > +	return err;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> > > >  
> > > > @@ -819,13 +942,13 @@
> > > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > > >  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > >  				struct sockaddr_vm *addr)
> > > >  {
> > > > -	return -EOPNOTSUPP;
> > > > +	return vsock_bind_stream(vsk, addr);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > >  
> > > >  bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > >  {
> > > > -	return false;
> > > > +	return true;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> > > >  
> > > > @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct
> > > > vsock_sock *vsk,
> > > >  			       struct msghdr *msg,
> > > >  			       size_t dgram_len)
> > > >  {
> > > > -	return -EOPNOTSUPP;
> > > > +	struct virtio_vsock_pkt_info info = {
> > > > +		.op = VIRTIO_VSOCK_OP_RW,
> > > > +		.msg = msg,
> > > > +		.pkt_len = dgram_len,
> > > > +		.vsk = vsk,
> > > > +		.remote_cid = remote_addr->svm_cid,
> > > > +		.remote_port = remote_addr->svm_port,
> > > > +	};
> > > > +
> > > > +	return virtio_transport_send_pkt_info(vsk, &info);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> > > >  
> > > > @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct
> > > > sock *sk,
> > > >  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> > > >  	int err = 0;
> > > >  
> > > > +	if (le16_to_cpu(vsock_hdr(skb)->type) ==
> > > > VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > +		virtio_transport_recv_enqueue(vsk, skb);
> > > > +		sk->sk_data_ready(sk);
> > > > +		return err;
> > > > +	}
> > > > +
> > > >  	switch (le16_to_cpu(hdr->op)) {
> > > >  	case VIRTIO_VSOCK_OP_RW:
> > > >  		virtio_transport_recv_enqueue(vsk, skb);
> > > > @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct sock
> > > > *sk, struct sk_buff *skb,
> > > >  static bool virtio_transport_valid_type(u16 type)
> > > >  {
> > > >  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> > > > -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> > > > +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> > > > +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
> > > >  }
> > > >  
> > > >  /* We are under the virtio-vsock's vsock->rx_lock or vhost-
> > > > vsock's vq->mutex
> > > > @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct
> > > > virtio_transport *t,
> > > >  		goto free_pkt;
> > > >  	}
> > > >  
> > > > +	if (sk->sk_type == SOCK_DGRAM) {
> > > > +		virtio_transport_recv_connected(sk, skb);
> > > > +		goto out;
> > > > +	}
> > > > +
> > > >  	space_available = virtio_transport_space_update(sk,
> > > > skb);
> > > >  
> > > >  	/* Update CID in case it has changed after a transport
> > > > reset event */
> > > > @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct
> > > > virtio_transport *t,
> > > >  		break;
> > > >  	}
> > > >  
> > > > +out:
> > > >  	release_sock(sk);
> > > >  
> > > >  	/* Release refcnt obtained when we fetched this socket
> > > > out of the
> > > > -- 
> > > > 2.35.1
> > > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: 
> > > virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: 
> > > virtio-dev-help@lists.oasis-open.org
> > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-16  9:58         ` Bobby Eshleman
@ 2022-08-18  8:35           ` Arseniy Krasnov
  2022-08-16 20:52             ` Bobby Eshleman
  0 siblings, 1 reply; 16+ messages in thread
From: Arseniy Krasnov @ 2022-08-18  8:35 UTC (permalink / raw)
  To: bobbyeshleman@gmail.com
  Cc: kvm@vger.kernel.org, jasowang@redhat.com,
	bobby.eshleman@gmail.com, davem@davemloft.net,
	virtio-dev@lists.oasis-open.org, stefanha@redhat.com,
	bobby.eshleman@bytedance.com, linux-kernel@vger.kernel.org,
	pabeni@redhat.com, edumazet@google.com, jiang.wang@bytedance.com,
	sgarzare@redhat.com, kuba@kernel.org, cong.wang@bytedance.com,
	netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	mst@redhat.com

On Tue, 2022-08-16 at 09:58 +0000, Bobby Eshleman wrote:
> On Wed, Aug 17, 2022 at 05:42:08AM +0000, Arseniy Krasnov wrote:
> > On 17.08.2022 08:01, Arseniy Krasnov wrote:
> > > On 16.08.2022 05:32, Bobby Eshleman wrote:
> > > > CC'ing virtio-dev@lists.oasis-open.org
> > > > 
> > > > On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman wrote:
> > > > > This patch supports dgram in virtio and on the vhost side.
> > > Hello,
> > > 
> > > sorry, i don't understand, how this maintains message boundaries?
> > > Or it
> > > is unnecessary for SOCK_DGRAM?
> > > 
> > > Thanks
> > > > > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > > ---
> > > > >  drivers/vhost/vsock.c                   |   2 +-
> > > > >  include/net/af_vsock.h                  |   2 +
> > > > >  include/uapi/linux/virtio_vsock.h       |   1 +
> > > > >  net/vmw_vsock/af_vsock.c                |  26 +++-
> > > > >  net/vmw_vsock/virtio_transport.c        |   2 +-
> > > > >  net/vmw_vsock/virtio_transport_common.c | 173
> > > > > ++++++++++++++++++++++--
> > > > >  6 files changed, 186 insertions(+), 20 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > > > > index a5d1bdb786fe..3dc72a5647ca 100644
> > > > > --- a/drivers/vhost/vsock.c
> > > > > +++ b/drivers/vhost/vsock.c
> > > > > @@ -925,7 +925,7 @@ static int __init vhost_vsock_init(void)
> > > > >  	int ret;
> > > > >  
> > > > >  	ret = vsock_core_register(&vhost_transport.transport,
> > > > > -				  VSOCK_TRANSPORT_F_H2G);
> > > > > +				  VSOCK_TRANSPORT_F_H2G |
> > > > > VSOCK_TRANSPORT_F_DGRAM);
> > > > >  	if (ret < 0)
> > > > >  		return ret;
> > > > >  
> > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > index 1c53c4c4d88f..37e55c81e4df 100644
> > > > > --- a/include/net/af_vsock.h
> > > > > +++ b/include/net/af_vsock.h
> > > > > @@ -78,6 +78,8 @@ struct vsock_sock {
> > > > >  s64 vsock_stream_has_data(struct vsock_sock *vsk);
> > > > >  s64 vsock_stream_has_space(struct vsock_sock *vsk);
> > > > >  struct sock *vsock_create_connected(struct sock *parent);
> > > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > > +		      struct sockaddr_vm *addr);
> > > > >  
> > > > >  /**** TRANSPORT ****/
> > > > >  
> > > > > diff --git a/include/uapi/linux/virtio_vsock.h
> > > > > b/include/uapi/linux/virtio_vsock.h
> > > > > index 857df3a3a70d..0975b9c88292 100644
> > > > > --- a/include/uapi/linux/virtio_vsock.h
> > > > > +++ b/include/uapi/linux/virtio_vsock.h
> > > > > @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> > > > >  enum virtio_vsock_type {
> > > > >  	VIRTIO_VSOCK_TYPE_STREAM = 1,
> > > > >  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> > > > > +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> > > > >  };
> > > > >  
> > > > >  enum virtio_vsock_op {
> > > > > diff --git a/net/vmw_vsock/af_vsock.c
> > > > > b/net/vmw_vsock/af_vsock.c
> > > > > index 1893f8aafa48..87e4ae1866d3 100644
> > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > @@ -675,6 +675,19 @@ static int
> > > > > __vsock_bind_connectible(struct vsock_sock *vsk,
> > > > >  	return 0;
> > > > >  }
> > > > >  
> > > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > > +		      struct sockaddr_vm *addr)
> > > > > +{
> > > > > +	int retval;
> > > > > +
> > > > > +	spin_lock_bh(&vsock_table_lock);
> > > > > +	retval = __vsock_bind_connectible(vsk, addr);
> > > > > +	spin_unlock_bh(&vsock_table_lock);
> > > > > +
> > > > > +	return retval;
> > > > > +}
> > > > > +EXPORT_SYMBOL(vsock_bind_stream);
> > > > > +
> > > > >  static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > > >  			      struct sockaddr_vm *addr)
> > > > >  {
> > > > > @@ -2363,11 +2376,16 @@ int vsock_core_register(const struct
> > > > > vsock_transport *t, int features)
> > > > >  	}
> > > > >  
> > > > >  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > > > > -		if (t_dgram) {
> > > > > -			err = -EBUSY;
> > > > > -			goto err_busy;
> > > > > +		/* TODO: always chose the G2H variant over
> > > > > others, support nesting later */
> > > > > +		if (features & VSOCK_TRANSPORT_F_G2H) {
> > > > > +			if (t_dgram)
> > > > > +				pr_warn("virtio_vsock: t_dgram
> > > > > already set\n");
> > > > > +			t_dgram = t;
> > > > > +		}
> > > > > +
> > > > > +		if (!t_dgram) {
> > > > > +			t_dgram = t;
> > > > >  		}
> > > > > -		t_dgram = t;
> > > > >  	}
> > > > >  
> > > > >  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> > > > > diff --git a/net/vmw_vsock/virtio_transport.c
> > > > > b/net/vmw_vsock/virtio_transport.c
> > > > > index 073314312683..d4526ca462d2 100644
> > > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > > @@ -850,7 +850,7 @@ static int __init virtio_vsock_init(void)
> > > > >  		return -ENOMEM;
> > > > >  
> > > > >  	ret = vsock_core_register(&virtio_transport.transport,
> > > > > -				  VSOCK_TRANSPORT_F_G2H);
> > > > > +				  VSOCK_TRANSPORT_F_G2H |
> > > > > VSOCK_TRANSPORT_F_DGRAM);
> > > > >  	if (ret)
> > > > >  		goto out_wq;
> > > > >  
> > > > > diff --git a/net/vmw_vsock/virtio_transport_common.c
> > > > > b/net/vmw_vsock/virtio_transport_common.c
> > > > > index bdf16fff054f..aedb48728677 100644
> > > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > @@ -229,7 +229,9 @@
> > > > > EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> > > > >  
> > > > >  static u16 virtio_transport_get_type(struct sock *sk)
> > > > >  {
> > > > > -	if (sk->sk_type == SOCK_STREAM)
> > > > > +	if (sk->sk_type == SOCK_DGRAM)
> > > > > +		return VIRTIO_VSOCK_TYPE_DGRAM;
> > > > > +	else if (sk->sk_type == SOCK_STREAM)
> > > > >  		return VIRTIO_VSOCK_TYPE_STREAM;
> > > > >  	else
> > > > >  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> > > > > @@ -287,22 +289,29 @@ static int
> > > > > virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > >  	vvs = vsk->trans;
> > > > >  
> > > > >  	/* we can send less than pkt_len bytes */
> > > > > -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > > > > -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > > +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> > > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > > +			pkt_len =
> > > > > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > > +		else
> > > > > +			return 0;
> > > > > +	}
> > > > >  
> > > > > -	/* virtio_transport_get_credit might return less than
> > > > > pkt_len credit */
> > > > > -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > > +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > > +		/* virtio_transport_get_credit might return
> > > > > less than pkt_len credit */
> > > > > +		pkt_len = virtio_transport_get_credit(vvs,
> > > > > pkt_len);
> > > > >  
> > > > > -	/* Do not send zero length OP_RW pkt */
> > > > > -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > -		return pkt_len;
> > > > > +		/* Do not send zero length OP_RW pkt */
> > > > > +		if (pkt_len == 0 && info->op ==
> > > > > VIRTIO_VSOCK_OP_RW)
> > > > > +			return pkt_len;
> > > > > +	}
> > > > >  
> > > > >  	skb = virtio_transport_alloc_skb(info, pkt_len,
> > > > >  					 src_cid, src_port,
> > > > >  					 dst_cid, dst_port,
> > > > >  					 &err);
> > > > >  	if (!skb) {
> > > > > -		virtio_transport_put_credit(vvs, pkt_len);
> > > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > > +			virtio_transport_put_credit(vvs,
> > > > > pkt_len);
> > > > >  		return err;
> > > > >  	}
> > > > >  
> > > > > @@ -586,6 +595,61 @@
> > > > > virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
> > > > >  
> > > > > +static ssize_t
> > > > > +virtio_transport_dgram_do_dequeue(struct vsock_sock *vsk,
> > > > > +				  struct msghdr *msg, size_t
> > > > > len)
> > > > > +{
> > > > > +	struct virtio_vsock_sock *vvs = vsk->trans;
> > > > > +	struct sk_buff *skb;
> > > > > +	size_t total = 0;
> > > > > +	u32 free_space;
> > > > > +	int err = -EFAULT;
> > > > > +
> > > > > +	spin_lock_bh(&vvs->rx_lock);
> > > > > +	if (total < len && !skb_queue_empty_lockless(&vvs-
> > > > > >rx_queue)) {
> > > > > +		skb = __skb_dequeue(&vvs->rx_queue);
> > > > > +
> > > > > +		total = len;
> > > > > +		if (total > skb->len - vsock_metadata(skb)-
> > > > > >off)
> > > > > +			total = skb->len - vsock_metadata(skb)-
> > > > > >off;
> > > > > +		else if (total < skb->len -
> > > > > vsock_metadata(skb)->off)
> > > > > +			msg->msg_flags |= MSG_TRUNC;
> > > > > +
> > > > > +		/* sk_lock is held by caller so no one else can
> > > > > dequeue.
> > > > > +		 * Unlock rx_lock since memcpy_to_msg() may
> > > > > sleep.
> > > > > +		 */
> > > > > +		spin_unlock_bh(&vvs->rx_lock);
> > > > > +
> > > > > +		err = memcpy_to_msg(msg, skb->data +
> > > > > vsock_metadata(skb)->off, total);
> > > > > +		if (err)
> > > > > +			return err;
> > > > > +
> > > > > +		spin_lock_bh(&vvs->rx_lock);
> > > > > +
> > > > > +		virtio_transport_dec_rx_pkt(vvs, skb);
> > > > > +		consume_skb(skb);
> > > > > +	}
> > > > > +
> > > > > +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs-
> > > > > >last_fwd_cnt);
> > > > > +
> > > > > +	spin_unlock_bh(&vvs->rx_lock);
> > > > > +
> > > > > +	if (total > 0 && msg->msg_name) {
> > > > > +		/* Provide the address of the sender. */
> > > > > +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr,
> > > > > msg->msg_name);
> > > > > +
> > > > > +		vsock_addr_init(vm_addr,
> > > > > le64_to_cpu(vsock_hdr(skb)->src_cid),
> > > > > +				le32_to_cpu(vsock_hdr(skb)-
> > > > > >src_port));
> > > > > +		msg->msg_namelen = sizeof(*vm_addr);
> > > > > +	}
> > > > > +	return total;
> > > > > +}
> > > > > +
> > > > > +static s64 virtio_transport_dgram_has_data(struct vsock_sock
> > > > > *vsk)
> > > > > +{
> > > > > +	return virtio_transport_stream_has_data(vsk);
> > > > > +}
> > > > > +
> > > > >  int
> > > > >  virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> > > > >  				   struct msghdr *msg,
> > > > > @@ -611,7 +675,66 @@ virtio_transport_dgram_dequeue(struct
> > > > > vsock_sock *vsk,
> > > > >  			       struct msghdr *msg,
> > > > >  			       size_t len, int flags)
> > > > >  {
> > > > > -	return -EOPNOTSUPP;
> > > > > +	struct sock *sk;
> > > > > +	size_t err = 0;
> > > > > +	long timeout;
> > > > > +
> > > > > +	DEFINE_WAIT(wait);
> > > > > +
> > > > > +	sk = &vsk->sk;
> > > > > +	err = 0;
> > > > > +
> > > > > +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags &
> > > > > MSG_PEEK)
> > > > > +		return -EOPNOTSUPP;
> > > > > +
> > > > > +	lock_sock(sk);
> > > > > +
> > > > > +	if (!len)
> > > > > +		goto out;
> > > > > +
> > > > > +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> > > > > +
> > > > > +	while (1) {
> > > > > +		s64 ready;
> > > > > +
> > > > > +		prepare_to_wait(sk_sleep(sk), &wait,
> > > > > TASK_INTERRUPTIBLE);
> > > > > +		ready = virtio_transport_dgram_has_data(vsk);
> > > > > +
> > > > > +		if (ready == 0) {
> > > > > +			if (timeout == 0) {
> > > > > +				err = -EAGAIN;
> > > > > +				finish_wait(sk_sleep(sk),
> > > > > &wait);
> > > > > +				break;
> > > > > +			}
> > > > > +
> > > > > +			release_sock(sk);
> > > > > +			timeout = schedule_timeout(timeout);
> > > > > +			lock_sock(sk);
> > > > > +
> > > > > +			if (signal_pending(current)) {
> > > > > +				err = sock_intr_errno(timeout);
> > > > > +				finish_wait(sk_sleep(sk),
> > > > > &wait);
> > > > > +				break;
> > > > > +			} else if (timeout == 0) {
> > > > > +				err = -EAGAIN;
> > > > > +				finish_wait(sk_sleep(sk),
> > > > > &wait);
> > > > > +				break;
> > > > > +			}
> > > > > +		} else {
> > > > > +			finish_wait(sk_sleep(sk), &wait);
> > > > > +
> > > > > +			if (ready < 0) {
> > > > > +				err = -ENOMEM;
> > > > > +				goto out;
> > > > > +			}
> > > > > +
> > > > > +			err =
> > > > > virtio_transport_dgram_do_dequeue(vsk, msg, len);
> > > > > +			break;
> > > > > +		}
> > > > > +	}
> > > > > +out:
> > > > > +	release_sock(sk);
> > > > > +	return err;
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> > ^^^
> > May be, this generic data waiting logic should be in af_vsock.c, as
> > for stream/seqpacket?
> > In this way, another transport which supports SOCK_DGRAM could
> > reuse it.
> 
> I think that is a great idea. I'll test that change for v2.
> 
> Thanks.

Also for v2, i tested Your patchset a little bit(write here to not
spread over all mails):
1) seqpacket test in vsock_test.c fails(seems MSG_EOR flag issue)
2) i can't do rmmod with the following config(after testing):
   CONFIG_VSOCKETS=m
   CONFIG_VIRTIO_VSOCKETS=m
   CONFIG_VIRTIO_VSOCKETS_COMMON=m
   CONFIG_VHOST=m
   CONFIG_VHOST_VSOCK=m
   Guest is shutdown, but rmmod fails.
3) virtio_transport_init + virtio_transport_exit seems must be
   under EXPORT_SYMBOL_GPL(), because both used in another module.
4) I tried to send 5kb(or 20kb not matter) piece of data, but got      
   kernel panic both in guest and later in host.

Thank You
> 
> > > > >  
> > > > > @@ -819,13 +942,13 @@
> > > > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > > > >  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > > >  				struct sockaddr_vm *addr)
> > > > >  {
> > > > > -	return -EOPNOTSUPP;
> > > > > +	return vsock_bind_stream(vsk, addr);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > > >  
> > > > >  bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > > >  {
> > > > > -	return false;
> > > > > +	return true;
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> > > > >  
> > > > > @@ -861,7 +984,16 @@ virtio_transport_dgram_enqueue(struct
> > > > > vsock_sock *vsk,
> > > > >  			       struct msghdr *msg,
> > > > >  			       size_t dgram_len)
> > > > >  {
> > > > > -	return -EOPNOTSUPP;
> > > > > +	struct virtio_vsock_pkt_info info = {
> > > > > +		.op = VIRTIO_VSOCK_OP_RW,
> > > > > +		.msg = msg,
> > > > > +		.pkt_len = dgram_len,
> > > > > +		.vsk = vsk,
> > > > > +		.remote_cid = remote_addr->svm_cid,
> > > > > +		.remote_port = remote_addr->svm_port,
> > > > > +	};
> > > > > +
> > > > > +	return virtio_transport_send_pkt_info(vsk, &info);
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> > > > >  
> > > > > @@ -1165,6 +1297,12 @@ virtio_transport_recv_connected(struct
> > > > > sock *sk,
> > > > >  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> > > > >  	int err = 0;
> > > > >  
> > > > > +	if (le16_to_cpu(vsock_hdr(skb)->type) ==
> > > > > VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > > +		virtio_transport_recv_enqueue(vsk, skb);
> > > > > +		sk->sk_data_ready(sk);
> > > > > +		return err;
> > > > > +	}
> > > > > +
> > > > >  	switch (le16_to_cpu(hdr->op)) {
> > > > >  	case VIRTIO_VSOCK_OP_RW:
> > > > >  		virtio_transport_recv_enqueue(vsk, skb);
> > > > > @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct
> > > > > sock *sk, struct sk_buff *skb,
> > > > >  static bool virtio_transport_valid_type(u16 type)
> > > > >  {
> > > > >  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> > > > > -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> > > > > +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> > > > > +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
> > > > >  }
> > > > >  
> > > > >  /* We are under the virtio-vsock's vsock->rx_lock or vhost-
> > > > > vsock's vq->mutex
> > > > > @@ -1384,6 +1523,11 @@ void virtio_transport_recv_pkt(struct
> > > > > virtio_transport *t,
> > > > >  		goto free_pkt;
> > > > >  	}
> > > > >  
> > > > > +	if (sk->sk_type == SOCK_DGRAM) {
> > > > > +		virtio_transport_recv_connected(sk, skb);
> > > > > +		goto out;
> > > > > +	}
> > > > > +
> > > > >  	space_available = virtio_transport_space_update(sk,
> > > > > skb);
> > > > >  
> > > > >  	/* Update CID in case it has changed after a transport
> > > > > reset event */
> > > > > @@ -1415,6 +1559,7 @@ void virtio_transport_recv_pkt(struct
> > > > > virtio_transport *t,
> > > > >  		break;
> > > > >  	}
> > > > >  
> > > > > +out:
> > > > >  	release_sock(sk);
> > > > >  
> > > > >  	/* Release refcnt obtained when we fetched this socket
> > > > > out of the
> > > > > -- 
> > > > > 2.35.1
> > > > > 
> > > > 
> > > > -------------------------------------------------------------
> > > > --------
> > > > To unsubscribe, e-mail: 
> > > > virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: 
> > > > virtio-dev-help@lists.oasis-open.org
> > > > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram
  2022-08-16 20:52             ` Bobby Eshleman
@ 2022-08-19  4:30               ` Arseniy Krasnov
  0 siblings, 0 replies; 16+ messages in thread
From: Arseniy Krasnov @ 2022-08-19  4:30 UTC (permalink / raw)
  To: bobbyeshleman@gmail.com
  Cc: kvm@vger.kernel.org, jasowang@redhat.com,
	bobby.eshleman@gmail.com, davem@davemloft.net,
	virtio-dev@lists.oasis-open.org, stefanha@redhat.com,
	bobby.eshleman@bytedance.com, linux-kernel@vger.kernel.org,
	pabeni@redhat.com, edumazet@google.com, jiang.wang@bytedance.com,
	sgarzare@redhat.com, kuba@kernel.org, cong.wang@bytedance.com,
	netdev@vger.kernel.org, virtualization@lists.linux-foundation.org,
	mst@redhat.com

On Tue, 2022-08-16 at 20:52 +0000, Bobby Eshleman wrote:
> On Thu, Aug 18, 2022 at 08:35:48AM +0000, Arseniy Krasnov wrote:
> > On Tue, 2022-08-16 at 09:58 +0000, Bobby Eshleman wrote:
> > > On Wed, Aug 17, 2022 at 05:42:08AM +0000, Arseniy Krasnov wrote:
> > > > On 17.08.2022 08:01, Arseniy Krasnov wrote:
> > > > > On 16.08.2022 05:32, Bobby Eshleman wrote:
> > > > > > CC'ing virtio-dev@lists.oasis-open.org
> > > > > > 
> > > > > > On Mon, Aug 15, 2022 at 10:56:08AM -0700, Bobby Eshleman
> > > > > > wrote:
> > > > > > > This patch supports dgram in virtio and on the vhost
> > > > > > > side.
> > > > > Hello,
> > > > > 
> > > > > sorry, i don't understand, how this maintains message
> > > > > boundaries?
> > > > > Or it
> > > > > is unnecessary for SOCK_DGRAM?
> > > > > 
> > > > > Thanks
> > > > > > > Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > > > Signed-off-by: Bobby Eshleman <
> > > > > > > bobby.eshleman@bytedance.com>
> > > > > > > ---
> > > > > > >  drivers/vhost/vsock.c                   |   2 +-
> > > > > > >  include/net/af_vsock.h                  |   2 +
> > > > > > >  include/uapi/linux/virtio_vsock.h       |   1 +
> > > > > > >  net/vmw_vsock/af_vsock.c                |  26 +++-
> > > > > > >  net/vmw_vsock/virtio_transport.c        |   2 +-
> > > > > > >  net/vmw_vsock/virtio_transport_common.c | 173
> > > > > > > ++++++++++++++++++++++--
> > > > > > >  6 files changed, 186 insertions(+), 20 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/vhost/vsock.c
> > > > > > > b/drivers/vhost/vsock.c
> > > > > > > index a5d1bdb786fe..3dc72a5647ca 100644
> > > > > > > --- a/drivers/vhost/vsock.c
> > > > > > > +++ b/drivers/vhost/vsock.c
> > > > > > > @@ -925,7 +925,7 @@ static int __init
> > > > > > > vhost_vsock_init(void)
> > > > > > >  	int ret;
> > > > > > >  
> > > > > > >  	ret = vsock_core_register(&vhost_transport.transport,
> > > > > > > -				  VSOCK_TRANSPORT_F_H2G);
> > > > > > > +				  VSOCK_TRANSPORT_F_H2G |
> > > > > > > VSOCK_TRANSPORT_F_DGRAM);
> > > > > > >  	if (ret < 0)
> > > > > > >  		return ret;
> > > > > > >  
> > > > > > > diff --git a/include/net/af_vsock.h
> > > > > > > b/include/net/af_vsock.h
> > > > > > > index 1c53c4c4d88f..37e55c81e4df 100644
> > > > > > > --- a/include/net/af_vsock.h
> > > > > > > +++ b/include/net/af_vsock.h
> > > > > > > @@ -78,6 +78,8 @@ struct vsock_sock {
> > > > > > >  s64 vsock_stream_has_data(struct vsock_sock *vsk);
> > > > > > >  s64 vsock_stream_has_space(struct vsock_sock *vsk);
> > > > > > >  struct sock *vsock_create_connected(struct sock
> > > > > > > *parent);
> > > > > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > > > > +		      struct sockaddr_vm *addr);
> > > > > > >  
> > > > > > >  /**** TRANSPORT ****/
> > > > > > >  
> > > > > > > diff --git a/include/uapi/linux/virtio_vsock.h
> > > > > > > b/include/uapi/linux/virtio_vsock.h
> > > > > > > index 857df3a3a70d..0975b9c88292 100644
> > > > > > > --- a/include/uapi/linux/virtio_vsock.h
> > > > > > > +++ b/include/uapi/linux/virtio_vsock.h
> > > > > > > @@ -70,6 +70,7 @@ struct virtio_vsock_hdr {
> > > > > > >  enum virtio_vsock_type {
> > > > > > >  	VIRTIO_VSOCK_TYPE_STREAM = 1,
> > > > > > >  	VIRTIO_VSOCK_TYPE_SEQPACKET = 2,
> > > > > > > +	VIRTIO_VSOCK_TYPE_DGRAM = 3,
> > > > > > >  };
> > > > > > >  
> > > > > > >  enum virtio_vsock_op {
> > > > > > > diff --git a/net/vmw_vsock/af_vsock.c
> > > > > > > b/net/vmw_vsock/af_vsock.c
> > > > > > > index 1893f8aafa48..87e4ae1866d3 100644
> > > > > > > --- a/net/vmw_vsock/af_vsock.c
> > > > > > > +++ b/net/vmw_vsock/af_vsock.c
> > > > > > > @@ -675,6 +675,19 @@ static int
> > > > > > > __vsock_bind_connectible(struct vsock_sock *vsk,
> > > > > > >  	return 0;
> > > > > > >  }
> > > > > > >  
> > > > > > > +int vsock_bind_stream(struct vsock_sock *vsk,
> > > > > > > +		      struct sockaddr_vm *addr)
> > > > > > > +{
> > > > > > > +	int retval;
> > > > > > > +
> > > > > > > +	spin_lock_bh(&vsock_table_lock);
> > > > > > > +	retval = __vsock_bind_connectible(vsk, addr);
> > > > > > > +	spin_unlock_bh(&vsock_table_lock);
> > > > > > > +
> > > > > > > +	return retval;
> > > > > > > +}
> > > > > > > +EXPORT_SYMBOL(vsock_bind_stream);
> > > > > > > +
> > > > > > >  static int __vsock_bind_dgram(struct vsock_sock *vsk,
> > > > > > >  			      struct sockaddr_vm *addr)
> > > > > > >  {
> > > > > > > @@ -2363,11 +2376,16 @@ int vsock_core_register(const
> > > > > > > struct
> > > > > > > vsock_transport *t, int features)
> > > > > > >  	}
> > > > > > >  
> > > > > > >  	if (features & VSOCK_TRANSPORT_F_DGRAM) {
> > > > > > > -		if (t_dgram) {
> > > > > > > -			err = -EBUSY;
> > > > > > > -			goto err_busy;
> > > > > > > +		/* TODO: always chose the G2H variant over
> > > > > > > others, support nesting later */
> > > > > > > +		if (features & VSOCK_TRANSPORT_F_G2H) {
> > > > > > > +			if (t_dgram)
> > > > > > > +				pr_warn("virtio_vsock: t_dgram
> > > > > > > already set\n");
> > > > > > > +			t_dgram = t;
> > > > > > > +		}
> > > > > > > +
> > > > > > > +		if (!t_dgram) {
> > > > > > > +			t_dgram = t;
> > > > > > >  		}
> > > > > > > -		t_dgram = t;
> > > > > > >  	}
> > > > > > >  
> > > > > > >  	if (features & VSOCK_TRANSPORT_F_LOCAL) {
> > > > > > > diff --git a/net/vmw_vsock/virtio_transport.c
> > > > > > > b/net/vmw_vsock/virtio_transport.c
> > > > > > > index 073314312683..d4526ca462d2 100644
> > > > > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > > > > @@ -850,7 +850,7 @@ static int __init
> > > > > > > virtio_vsock_init(void)
> > > > > > >  		return -ENOMEM;
> > > > > > >  
> > > > > > >  	ret = vsock_core_register(&virtio_transport.transport,
> > > > > > > -				  VSOCK_TRANSPORT_F_G2H);
> > > > > > > +				  VSOCK_TRANSPORT_F_G2H |
> > > > > > > VSOCK_TRANSPORT_F_DGRAM);
> > > > > > >  	if (ret)
> > > > > > >  		goto out_wq;
> > > > > > >  
> > > > > > > diff --git a/net/vmw_vsock/virtio_transport_common.c
> > > > > > > b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > index bdf16fff054f..aedb48728677 100644
> > > > > > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > > > > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > > > > > @@ -229,7 +229,9 @@
> > > > > > > EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt);
> > > > > > >  
> > > > > > >  static u16 virtio_transport_get_type(struct sock *sk)
> > > > > > >  {
> > > > > > > -	if (sk->sk_type == SOCK_STREAM)
> > > > > > > +	if (sk->sk_type == SOCK_DGRAM)
> > > > > > > +		return VIRTIO_VSOCK_TYPE_DGRAM;
> > > > > > > +	else if (sk->sk_type == SOCK_STREAM)
> > > > > > >  		return VIRTIO_VSOCK_TYPE_STREAM;
> > > > > > >  	else
> > > > > > >  		return VIRTIO_VSOCK_TYPE_SEQPACKET;
> > > > > > > @@ -287,22 +289,29 @@ static int
> > > > > > > virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > > > > >  	vvs = vsk->trans;
> > > > > > >  
> > > > > > >  	/* we can send less than pkt_len bytes */
> > > > > > > -	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
> > > > > > > -		pkt_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > > > > +	if (pkt_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) {
> > > > > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > > > > +			pkt_len =
> > > > > > > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> > > > > > > +		else
> > > > > > > +			return 0;
> > > > > > > +	}
> > > > > > >  
> > > > > > > -	/* virtio_transport_get_credit might return less than
> > > > > > > pkt_len credit */
> > > > > > > -	pkt_len = virtio_transport_get_credit(vvs, pkt_len);
> > > > > > > +	if (info->type != VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > > > > +		/* virtio_transport_get_credit might return
> > > > > > > less than pkt_len credit */
> > > > > > > +		pkt_len = virtio_transport_get_credit(vvs,
> > > > > > > pkt_len);
> > > > > > >  
> > > > > > > -	/* Do not send zero length OP_RW pkt */
> > > > > > > -	if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
> > > > > > > -		return pkt_len;
> > > > > > > +		/* Do not send zero length OP_RW pkt */
> > > > > > > +		if (pkt_len == 0 && info->op ==
> > > > > > > VIRTIO_VSOCK_OP_RW)
> > > > > > > +			return pkt_len;
> > > > > > > +	}
> > > > > > >  
> > > > > > >  	skb = virtio_transport_alloc_skb(info, pkt_len,
> > > > > > >  					 src_cid, src_port,
> > > > > > >  					 dst_cid, dst_port,
> > > > > > >  					 &err);
> > > > > > >  	if (!skb) {
> > > > > > > -		virtio_transport_put_credit(vvs, pkt_len);
> > > > > > > +		if (info->type != VIRTIO_VSOCK_TYPE_DGRAM)
> > > > > > > +			virtio_transport_put_credit(vvs,
> > > > > > > pkt_len);
> > > > > > >  		return err;
> > > > > > >  	}
> > > > > > >  
> > > > > > > @@ -586,6 +595,61 @@
> > > > > > > virtio_transport_seqpacket_dequeue(struct vsock_sock
> > > > > > > *vsk,
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);
> > > > > > >  
> > > > > > > +static ssize_t
> > > > > > > +virtio_transport_dgram_do_dequeue(struct vsock_sock
> > > > > > > *vsk,
> > > > > > > +				  struct msghdr *msg, size_t
> > > > > > > len)
> > > > > > > +{
> > > > > > > +	struct virtio_vsock_sock *vvs = vsk->trans;
> > > > > > > +	struct sk_buff *skb;
> > > > > > > +	size_t total = 0;
> > > > > > > +	u32 free_space;
> > > > > > > +	int err = -EFAULT;
> > > > > > > +
> > > > > > > +	spin_lock_bh(&vvs->rx_lock);
> > > > > > > +	if (total < len && !skb_queue_empty_lockless(&vvs-
> > > > > > > > rx_queue)) {
> > > > > > > +		skb = __skb_dequeue(&vvs->rx_queue);
> > > > > > > +
> > > > > > > +		total = len;
> > > > > > > +		if (total > skb->len - vsock_metadata(skb)-
> > > > > > > > off)
> > > > > > > +			total = skb->len - vsock_metadata(skb)-
> > > > > > > > off;
> > > > > > > +		else if (total < skb->len -
> > > > > > > vsock_metadata(skb)->off)
> > > > > > > +			msg->msg_flags |= MSG_TRUNC;
> > > > > > > +
> > > > > > > +		/* sk_lock is held by caller so no one else can
> > > > > > > dequeue.
> > > > > > > +		 * Unlock rx_lock since memcpy_to_msg() may
> > > > > > > sleep.
> > > > > > > +		 */
> > > > > > > +		spin_unlock_bh(&vvs->rx_lock);
> > > > > > > +
> > > > > > > +		err = memcpy_to_msg(msg, skb->data +
> > > > > > > vsock_metadata(skb)->off, total);
> > > > > > > +		if (err)
> > > > > > > +			return err;
> > > > > > > +
> > > > > > > +		spin_lock_bh(&vvs->rx_lock);
> > > > > > > +
> > > > > > > +		virtio_transport_dec_rx_pkt(vvs, skb);
> > > > > > > +		consume_skb(skb);
> > > > > > > +	}
> > > > > > > +
> > > > > > > +	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs-
> > > > > > > > last_fwd_cnt);
> > > > > > > +
> > > > > > > +	spin_unlock_bh(&vvs->rx_lock);
> > > > > > > +
> > > > > > > +	if (total > 0 && msg->msg_name) {
> > > > > > > +		/* Provide the address of the sender. */
> > > > > > > +		DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr,
> > > > > > > msg->msg_name);
> > > > > > > +
> > > > > > > +		vsock_addr_init(vm_addr,
> > > > > > > le64_to_cpu(vsock_hdr(skb)->src_cid),
> > > > > > > +				le32_to_cpu(vsock_hdr(skb)-
> > > > > > > > src_port));
> > > > > > > +		msg->msg_namelen = sizeof(*vm_addr);
> > > > > > > +	}
> > > > > > > +	return total;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static s64 virtio_transport_dgram_has_data(struct
> > > > > > > vsock_sock
> > > > > > > *vsk)
> > > > > > > +{
> > > > > > > +	return virtio_transport_stream_has_data(vsk);
> > > > > > > +}
> > > > > > > +
> > > > > > >  int
> > > > > > >  virtio_transport_seqpacket_enqueue(struct vsock_sock
> > > > > > > *vsk,
> > > > > > >  				   struct msghdr *msg,
> > > > > > > @@ -611,7 +675,66 @@
> > > > > > > virtio_transport_dgram_dequeue(struct
> > > > > > > vsock_sock *vsk,
> > > > > > >  			       struct msghdr *msg,
> > > > > > >  			       size_t len, int flags)
> > > > > > >  {
> > > > > > > -	return -EOPNOTSUPP;
> > > > > > > +	struct sock *sk;
> > > > > > > +	size_t err = 0;
> > > > > > > +	long timeout;
> > > > > > > +
> > > > > > > +	DEFINE_WAIT(wait);
> > > > > > > +
> > > > > > > +	sk = &vsk->sk;
> > > > > > > +	err = 0;
> > > > > > > +
> > > > > > > +	if (flags & MSG_OOB || flags & MSG_ERRQUEUE || flags &
> > > > > > > MSG_PEEK)
> > > > > > > +		return -EOPNOTSUPP;
> > > > > > > +
> > > > > > > +	lock_sock(sk);
> > > > > > > +
> > > > > > > +	if (!len)
> > > > > > > +		goto out;
> > > > > > > +
> > > > > > > +	timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
> > > > > > > +
> > > > > > > +	while (1) {
> > > > > > > +		s64 ready;
> > > > > > > +
> > > > > > > +		prepare_to_wait(sk_sleep(sk), &wait,
> > > > > > > TASK_INTERRUPTIBLE);
> > > > > > > +		ready = virtio_transport_dgram_has_data(vsk);
> > > > > > > +
> > > > > > > +		if (ready == 0) {
> > > > > > > +			if (timeout == 0) {
> > > > > > > +				err = -EAGAIN;
> > > > > > > +				finish_wait(sk_sleep(sk),
> > > > > > > &wait);
> > > > > > > +				break;
> > > > > > > +			}
> > > > > > > +
> > > > > > > +			release_sock(sk);
> > > > > > > +			timeout = schedule_timeout(timeout);
> > > > > > > +			lock_sock(sk);
> > > > > > > +
> > > > > > > +			if (signal_pending(current)) {
> > > > > > > +				err = sock_intr_errno(timeout);
> > > > > > > +				finish_wait(sk_sleep(sk),
> > > > > > > &wait);
> > > > > > > +				break;
> > > > > > > +			} else if (timeout == 0) {
> > > > > > > +				err = -EAGAIN;
> > > > > > > +				finish_wait(sk_sleep(sk),
> > > > > > > &wait);
> > > > > > > +				break;
> > > > > > > +			}
> > > > > > > +		} else {
> > > > > > > +			finish_wait(sk_sleep(sk), &wait);
> > > > > > > +
> > > > > > > +			if (ready < 0) {
> > > > > > > +				err = -ENOMEM;
> > > > > > > +				goto out;
> > > > > > > +			}
> > > > > > > +
> > > > > > > +			err =
> > > > > > > virtio_transport_dgram_do_dequeue(vsk, msg, len);
> > > > > > > +			break;
> > > > > > > +		}
> > > > > > > +	}
> > > > > > > +out:
> > > > > > > +	release_sock(sk);
> > > > > > > +	return err;
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue);
> > > > ^^^
> > > > May be, this generic data waiting logic should be in
> > > > af_vsock.c, as
> > > > for stream/seqpacket?
> > > > In this way, another transport which supports SOCK_DGRAM could
> > > > reuse it.
> > > 
> > > I think that is a great idea. I'll test that change for v2.
> > > 
> > > Thanks.
> > 
> > Also for v2, i tested Your patchset a little bit(write here to not
> > spread over all mails):
> > 1) seqpacket test in vsock_test.c fails(seems MSG_EOR flag issue)
> 
> I will investigate.
> 
> > 2) i can't do rmmod with the following config(after testing):
> >    CONFIG_VSOCKETS=m
> >    CONFIG_VIRTIO_VSOCKETS=m
> >    CONFIG_VIRTIO_VSOCKETS_COMMON=m
> >    CONFIG_VHOST=m
> >    CONFIG_VHOST_VSOCK=m
> >    Guest is shutdown, but rmmod fails.
> > 3) virtio_transport_init + virtio_transport_exit seems must be
> >    under EXPORT_SYMBOL_GPL(), because both used in another module.
> 
> Definitely, will fix.
> 
> > 4) I tried to send 5kb(or 20kb not matter) piece of data, but
> > got      
> >    kernel panic both in guest and later in host.
> > 
> 
> Thanks for catching that. I can reproduce it intermittently, but only
> for seqpacket. Did you happen to see this for other socket types as
> well?
> 
> Thanks

I got this for SOCK_DGRAM, i didnt test seqpacket or stream.

Thanks, Arseniy

> 
> > Thank You
> > > > > > >  
> > > > > > > @@ -819,13 +942,13 @@
> > > > > > > EXPORT_SYMBOL_GPL(virtio_transport_stream_allow);
> > > > > > >  int virtio_transport_dgram_bind(struct vsock_sock *vsk,
> > > > > > >  				struct sockaddr_vm *addr)
> > > > > > >  {
> > > > > > > -	return -EOPNOTSUPP;
> > > > > > > +	return vsock_bind_stream(vsk, addr);
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind);
> > > > > > >  
> > > > > > >  bool virtio_transport_dgram_allow(u32 cid, u32 port)
> > > > > > >  {
> > > > > > > -	return false;
> > > > > > > +	return true;
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow);
> > > > > > >  
> > > > > > > @@ -861,7 +984,16 @@
> > > > > > > virtio_transport_dgram_enqueue(struct
> > > > > > > vsock_sock *vsk,
> > > > > > >  			       struct msghdr *msg,
> > > > > > >  			       size_t dgram_len)
> > > > > > >  {
> > > > > > > -	return -EOPNOTSUPP;
> > > > > > > +	struct virtio_vsock_pkt_info info = {
> > > > > > > +		.op = VIRTIO_VSOCK_OP_RW,
> > > > > > > +		.msg = msg,
> > > > > > > +		.pkt_len = dgram_len,
> > > > > > > +		.vsk = vsk,
> > > > > > > +		.remote_cid = remote_addr->svm_cid,
> > > > > > > +		.remote_port = remote_addr->svm_port,
> > > > > > > +	};
> > > > > > > +
> > > > > > > +	return virtio_transport_send_pkt_info(vsk, &info);
> > > > > > >  }
> > > > > > >  EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue);
> > > > > > >  
> > > > > > > @@ -1165,6 +1297,12 @@
> > > > > > > virtio_transport_recv_connected(struct
> > > > > > > sock *sk,
> > > > > > >  	struct virtio_vsock_hdr *hdr = vsock_hdr(skb);
> > > > > > >  	int err = 0;
> > > > > > >  
> > > > > > > +	if (le16_to_cpu(vsock_hdr(skb)->type) ==
> > > > > > > VIRTIO_VSOCK_TYPE_DGRAM) {
> > > > > > > +		virtio_transport_recv_enqueue(vsk, skb);
> > > > > > > +		sk->sk_data_ready(sk);
> > > > > > > +		return err;
> > > > > > > +	}
> > > > > > > +
> > > > > > >  	switch (le16_to_cpu(hdr->op)) {
> > > > > > >  	case VIRTIO_VSOCK_OP_RW:
> > > > > > >  		virtio_transport_recv_enqueue(vsk, skb);
> > > > > > > @@ -1320,7 +1458,8 @@ virtio_transport_recv_listen(struct
> > > > > > > sock *sk, struct sk_buff *skb,
> > > > > > >  static bool virtio_transport_valid_type(u16 type)
> > > > > > >  {
> > > > > > >  	return (type == VIRTIO_VSOCK_TYPE_STREAM) ||
> > > > > > > -	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET);
> > > > > > > +	       (type == VIRTIO_VSOCK_TYPE_SEQPACKET) ||
> > > > > > > +	       (type == VIRTIO_VSOCK_TYPE_DGRAM);
> > > > > > >  }
> > > > > > >  
> > > > > > >  /* We are under the virtio-vsock's vsock->rx_lock or
> > > > > > > vhost-
> > > > > > > vsock's vq->mutex
> > > > > > > @@ -1384,6 +1523,11 @@ void
> > > > > > > virtio_transport_recv_pkt(struct
> > > > > > > virtio_transport *t,
> > > > > > >  		goto free_pkt;
> > > > > > >  	}
> > > > > > >  
> > > > > > > +	if (sk->sk_type == SOCK_DGRAM) {
> > > > > > > +		virtio_transport_recv_connected(sk, skb);
> > > > > > > +		goto out;
> > > > > > > +	}
> > > > > > > +
> > > > > > >  	space_available = virtio_transport_space_update(sk,
> > > > > > > skb);
> > > > > > >  
> > > > > > >  	/* Update CID in case it has changed after a transport
> > > > > > > reset event */
> > > > > > > @@ -1415,6 +1559,7 @@ void
> > > > > > > virtio_transport_recv_pkt(struct
> > > > > > > virtio_transport *t,
> > > > > > >  		break;
> > > > > > >  	}
> > > > > > >  
> > > > > > > +out:
> > > > > > >  	release_sock(sk);
> > > > > > >  
> > > > > > >  	/* Release refcnt obtained when we fetched this socket
> > > > > > > out of the
> > > > > > > -- 
> > > > > > > 2.35.1
> > > > > > > 
> > > > > > 
> > > > > > ---------------------------------------------------------
> > > > > > ----
> > > > > > --------
> > > > > > To unsubscribe, e-mail: 
> > > > > > virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: 
> > > > > > virtio-dev-help@lists.oasis-open.org
> > > > > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: 
> > > virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: 
> > > virtio-dev-help@lists.oasis-open.org
> > > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-08-19  4:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1660362668.git.bobby.eshleman@bytedance.com>
2022-08-16  2:29 ` [virtio-dev] Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Bobby Eshleman
     [not found] ` <65d117ddc530d12a6d47fcc45b38891465a90d9f.1660362668.git.bobby.eshleman@bytedance.com>
2022-08-16  2:30   ` [virtio-dev] Re: [PATCH 1/6] vsock: replace virtio_vsock_pkt with sk_buff Bobby Eshleman
     [not found] ` <d81818b868216c774613dd03641fcfe63cc55a45.1660362668.git.bobby.eshleman@bytedance.com>
2022-08-16  2:30   ` [virtio-dev] Re: [PATCH 2/6] vsock: return errors other than -ENOMEM to socket Bobby Eshleman
2022-08-17  5:28     ` Arseniy Krasnov
     [not found] ` <5a93c5aad99d79f028d349cb7e3c128c65d5d7e2.1660362668.git.bobby.eshleman@bytedance.com>
2022-08-16  2:31   ` [virtio-dev] Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock Bobby Eshleman
     [not found] ` <3d1f32c4da81f8a0870e126369ba12bc8c4ad048.1660362668.git.bobby.eshleman@bytedance.com>
2022-08-16  2:31   ` [virtio-dev] Re: [PATCH 4/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
     [not found] ` <3cb082f1c88f3f2ef1fc250dbc0745fb79c745c7.1660362668.git.bobby.eshleman@bytedance.com>
2022-08-16  2:32   ` [virtio-dev] Re: [PATCH 5/6] virtio/vsock: add support for dgram Bobby Eshleman
2022-08-17  5:01     ` Arseniy Krasnov
2022-08-16  9:57       ` Bobby Eshleman
2022-08-18  8:24         ` Arseniy Krasnov
2022-08-17  5:42       ` Arseniy Krasnov
2022-08-16  9:58         ` Bobby Eshleman
2022-08-18  8:35           ` Arseniy Krasnov
2022-08-16 20:52             ` Bobby Eshleman
2022-08-19  4:30               ` Arseniy Krasnov
     [not found] ` <db2e6c0ffa559ae6b8572b1981a6ad566aa73178.1660362669.git.bobby.eshleman@bytedance.com>
2022-08-16  2:32   ` [virtio-dev] Re: [PATCH 6/6] vsock_test: add tests for vsock dgram Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox