linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Packed ring for vhost
@ 2018-02-14  2:37 Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jason Wang @ 2018-02-14  2:37 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie, Jason Wang

Hi all:

This RFC implement a subset of packed ring which was described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
. The code were tested with pmd implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
change was needed for pmd codes to kick virtqueue since it assumes a
busy polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

It's not a complete implemention, here's what were missed:

- Device Area
- Driver Area
- Descriptor indirection
- Zerocopy may not be functional
- Migration path is not tested
- Vhost devices except for net
- vIOMMU can not work (mainly because the metadata prefetch is not
  implemented).
- See FIXME/TODO in the codes for more details
- No batching or other optimizations were implemented

For a quick prototype, this series open code the tracking of warp
counter and descriptor index at net device. This will be addressed in
the future by:

- Move get_rx_bufs() from net.c to vhost.c
- Let vhost_get_vq_desc() returns vring_used_elem instad of head id

With the above, we can hide the internal (at least part of) vring
layout from specific device.

Please review.

Thanks

Jason Wang (2):
  virtio: introduce packed ring defines
  vhost: packed ring support

 drivers/vhost/net.c                |  14 +-
 drivers/vhost/vhost.c              | 351 ++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.h              |   6 +-
 include/uapi/linux/virtio_config.h |   9 +
 include/uapi/linux/virtio_ring.h   |  17 ++
 5 files changed, 369 insertions(+), 28 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
@ 2018-02-14  2:37 ` Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-02-14  2:37 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie, Jason Wang

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/uapi/linux/virtio_config.h |  9 +++++++++
 include/uapi/linux/virtio_ring.h   | 17 +++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e209..5903d51 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -71,4 +71,13 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+#define VIRTIO_F_RING_PACKED		34
+
+/*
+ * This feature indicates that all buffers are used by the device in
+ * the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5fa..a169b53 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -43,6 +43,8 @@
 #define VRING_DESC_F_WRITE	2
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT	4
+#define VRING_DESC_F_AVAIL      7
+#define VRING_DESC_F_USED	15
 
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
@@ -62,6 +64,17 @@
  * at the end of the used ring. Guest should ignore the used->flags field. */
 #define VIRTIO_RING_F_EVENT_IDX		29
 
+struct vring_desc_packed {
+	/* Buffer Address. */
+	__virtio64 addr;
+	/* Buffer Length. */
+	__virtio32 len;
+	/* Buffer ID. */
+	__virtio16 id;
+	/* The flags depending on descriptor type. */
+	__virtio16 flags;
+};
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc {
 	/* Address (guest-physical). */
@@ -86,6 +99,10 @@ struct vring_used_elem {
 	__virtio32 id;
 	/* Total length of the descriptor chain which was used (written to) */
 	__virtio32 len;
+	/* Index of the descriptor that needs to write, used by packed ring. */
+	u16 idx;
+	/* Wrap counter for this used desc, used by packed ring. */
+	bool wrap_counter;
 };
 
 struct vring_used {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 2/2] vhost: packed ring support
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
@ 2018-02-14  2:37 ` Jason Wang
  2018-02-27  9:03   ` Tiwei Bie
  2018-02-14  2:43 ` [PATCH RFC 0/2] Packed ring for vhost Jason Wang
  2018-02-14  2:47 ` Michael S. Tsirkin
  3 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2018-02-14  2:37 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie, Jason Wang

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |  14 +-
 drivers/vhost/vhost.c | 351 ++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.h |   6 +-
 3 files changed, 343 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c613d2e..65b27c9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -67,7 +67,8 @@ enum {
 	VHOST_NET_FEATURES = VHOST_FEATURES |
 			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 			 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-			 (1ULL << VIRTIO_F_IOMMU_PLATFORM)
+			 (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+			 (1ULL << VIRTIO_F_RING_PACKED)
 };
 
 enum {
@@ -473,6 +474,7 @@ static void handle_tx(struct vhost_net *net)
 	struct socket *sock;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy, zcopy_used;
+	struct vring_used_elem used;
 
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
@@ -494,6 +496,8 @@ static void handle_tx(struct vhost_net *net)
 			vhost_zerocopy_signal_used(net, vq);
 
 
+		used.idx = vq->last_avail_idx & (vq->num - 1);
+		used.wrap_counter = vq->used_wrap_counter;
 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
 						ARRAY_SIZE(vq->iov),
 						&out, &in);
@@ -515,6 +519,8 @@ static void handle_tx(struct vhost_net *net)
 		}
 		/* Skip header. TODO: support TSO. */
 		len = iov_length(vq->iov, out);
+		used.id = head;
+		used.len = 0;
 		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
 		iov_iter_advance(&msg.msg_iter, hdr_size);
 		/* Sanity check */
@@ -576,7 +582,7 @@ static void handle_tx(struct vhost_net *net)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
 		if (!zcopy_used)
-			vhost_add_used_and_signal(&net->dev, vq, head, 0);
+			vhost_add_used_and_signal_n(&net->dev, vq, &used, 1);
 		else
 			vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
@@ -691,6 +697,8 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 			r = -ENOBUFS;
 			goto err;
 		}
+		heads[headcount].idx = vq->last_avail_idx & (vq->num - 1);
+		heads[headcount].wrap_counter = vq->used_wrap_counter;
 		r = vhost_get_vq_desc(vq, vq->iov + seg,
 				      ARRAY_SIZE(vq->iov) - seg, &out,
 				      &in, log, log_num);
@@ -780,6 +788,8 @@ static void handle_rx(struct vhost_net *net)
 	vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
+	/* FIXME: workaround for current dpdk prototype */
+	mergeable = false;
 
 	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
 		sock_len += sock_hlen;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2db5af8..5667d03 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -324,6 +324,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vhost_reset_is_le(vq);
 	vhost_disable_cross_endian(vq);
 	vq->busyloop_timeout = 0;
+	vq->used_wrap_counter = true;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
 	__vhost_vq_meta_reset(vq);
@@ -1136,10 +1137,22 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
 	return 0;
 }
 
-static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-			struct vring_desc __user *desc,
-			struct vring_avail __user *avail,
-			struct vring_used __user *used)
+static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
+			       struct vring_desc __user *desc,
+			       struct vring_avail __user *avail,
+			       struct vring_used __user *used)
+{
+	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
+
+	/* FIXME: check device area and driver area */
+	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
+	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
+}
+
+static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
+			      struct vring_desc __user *desc,
+			      struct vring_avail __user *avail,
+			      struct vring_used __user *used)
 
 {
 	size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
@@ -1151,6 +1164,17 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
 			sizeof *used + num * sizeof *used->ring + s);
 }
 
+static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
+			struct vring_desc __user *desc,
+			struct vring_avail __user *avail,
+			struct vring_used __user *used)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vq_access_ok_packed(vq, num, desc, avail, used);
+	else
+		return vq_access_ok_split(vq, num, desc, avail, used);
+}
+
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
 				 const struct vhost_umem_node *node,
 				 int type)
@@ -1763,6 +1787,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 
 	vhost_init_is_le(vq);
 
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return 0;
+
 	r = vhost_update_used_flags(vq);
 	if (r)
 		goto err;
@@ -1947,18 +1974,136 @@ static int get_indirect(struct vhost_virtqueue *vq,
 	return 0;
 }
 
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access.  Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
+#define DESC_AVAIL (1 << VRING_DESC_F_AVAIL)
+#define DESC_USED  (1 << VRING_DESC_F_USED)
+static bool desc_is_avail(struct vhost_virtqueue *vq,
+			  struct vring_desc_packed *desc)
+{
+	if (vq->used_wrap_counter)
+		if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
+			return true;
+	if (vq->used_wrap_counter == false)
+		if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
+			return true;
+
+	return false;
+}
+
+static void set_desc_used(struct vhost_virtqueue *vq,
+			  struct vring_desc_packed *desc, bool wrap_counter)
+{
+	__virtio16 flags = desc->flags;
+
+	if (wrap_counter) {
+		desc->flags |= cpu_to_vhost16(vq, DESC_AVAIL);
+		desc->flags |= cpu_to_vhost16(vq, DESC_USED);
+	} else {
+		desc->flags &= ~cpu_to_vhost16(vq, DESC_AVAIL);
+		desc->flags &= ~cpu_to_vhost16(vq, DESC_USED);
+	}
+
+	desc->flags = flags;
+}
+
+static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num,
+				    struct vhost_log *log,
+				    unsigned int *log_num)
+{
+	struct vring_desc_packed desc;
+	int ret, access, i;
+	u16 avail_idx = vq->last_avail_idx;
+
+	/* When we start there are none of either input nor output. */
+	*out_num = *in_num = 0;
+	if (unlikely(log))
+		*log_num = 0;
+
+	do {
+		unsigned int iov_count = *in_num + *out_num;
+
+		i = vq->last_avail_idx & (vq->num - 1);
+		ret = vhost_copy_from_user(vq, &desc, vq->desc_packed + i,
+					   sizeof(desc));
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+				i, vq->desc_packed + i);
+			return -EFAULT;
+		}
+
+		if (!desc_is_avail(vq, &desc)) {
+			/* If there's nothing new since last we looked, return
+			 * invalid.
+			 */
+			if (likely(avail_idx == vq->last_avail_idx))
+				return vq->num;
+		}
+
+		/* Only start to read descriptor after we're sure it was
+		 * available.
+		 */
+		smp_rmb();
+
+		/* FIXME: support indirect */
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
+			vq_err(vq, "Indirect descriptor is not supported\n");
+			return -EFAULT;
+		}
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
+				     vhost32_to_cpu(vq, desc.len),
+				     iov + iov_count, iov_size - iov_count,
+				     access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d idx %d\n",
+					ret, i);
+			return ret;
+		}
+
+		if (access == VHOST_ACCESS_WO) {
+			/* If this is an input descriptor,
+			 * increment that count.
+			 */
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr =
+					vhost64_to_cpu(vq, desc.addr);
+				log[*log_num].len =
+					vhost32_to_cpu(vq, desc.len);
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Desc out after in: idx %d\n",
+				       i);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+
+		/* On success, increment avail index. */
+		if ((++vq->last_avail_idx & (vq->num - 1)) == 0)
+			vq->used_wrap_counter ^= 1;
+
+	/* If this descriptor says it doesn't chain, we're done. */
+	} while (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT));
+
+	return desc.id;
+}
+
+static int vhost_get_vq_desc_split(struct vhost_virtqueue *vq,
+				   struct iovec iov[], unsigned int iov_size,
+				   unsigned int *out_num, unsigned int *in_num,
+				   struct vhost_log *log, unsigned int *log_num)
 {
 	struct vring_desc desc;
 	unsigned int i, head, found = 0;
@@ -2096,6 +2241,30 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
 	return head;
 }
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error.
+ */
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct iovec iov[], unsigned int iov_size,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_get_vq_desc_packed(vq, iov, iov_size,
+						out_num, in_num,
+						log, log_num);
+	else
+		return vhost_get_vq_desc_split(vq, iov, iov_size,
+					       out_num, in_num,
+					       log, log_num);
+}
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
@@ -2161,10 +2330,48 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	return 0;
 }
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
-		     unsigned count)
+static int vhost_add_used_n_packed(struct vhost_virtqueue *vq,
+				   struct vring_used_elem *heads,
+				   unsigned int count)
+{
+	struct vring_desc_packed desc = {
+		.addr = 0,
+		.flags = 0,
+	};
+	int i, ret;
+
+	for (i = 0; i < count; i++) {
+		desc.id = heads[i].id;
+		desc.len = heads[i].len;
+		set_desc_used(vq, &desc, heads[i].wrap_counter);
+
+		/* Update flags etc before desc is written */
+		smp_mb();
+
+		ret = vhost_copy_to_user(vq, vq->desc_packed + heads[i].idx,
+					 &desc, sizeof(desc));
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to set descriptor: idx %d addr %p\n",
+			       heads[i].idx, vq->desc_packed + heads[i].idx);
+			return -EFAULT;
+		}
+		if (unlikely(vq->log_used)) {
+			/* Make sure desc is written before update log. */
+			smp_wmb();
+			log_write(vq->log_base,
+				  vq->log_addr + heads[i].idx * sizeof(desc),
+				  sizeof(desc));
+			if (vq->log_ctx)
+				eventfd_signal(vq->log_ctx, 1);
+		}
+	}
+
+	return 0;
+}
+
+static int vhost_add_used_n_split(struct vhost_virtqueue *vq,
+				  struct vring_used_elem *heads,
+				  unsigned int count)
 {
 	int start, n, r;
 
@@ -2196,6 +2403,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 	}
 	return r;
 }
+
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd.
+ */
+int vhost_add_used_n(struct vhost_virtqueue *vq,
+		     struct vring_used_elem *heads,
+		     unsigned int count)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_add_used_n_packed(vq, heads, count);
+	else
+		return vhost_add_used_n_split(vq, heads, count);
+}
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
@@ -2203,6 +2423,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
+
+	/* FIXME: check driver area */
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return false;
+
 	/* Flush out used index updates. This is paired
 	 * with the barrier that the Guest executes when enabling
 	 * interrupts. */
@@ -2265,7 +2490,8 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
 /* return true if we're sure that avaiable ring is empty */
-bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_vq_avail_empty_split(struct vhost_dev *dev,
+				       struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -2280,10 +2506,61 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	return vq->avail_idx == vq->last_avail_idx;
 }
+
+/* FIXME: unify codes with vhost_enable_notify_packed() */
+static bool vhost_vq_avail_empty_packed(struct vhost_dev *dev,
+					struct vhost_virtqueue *vq)
+{
+	struct vring_desc_packed desc;
+	int ret, i = vq->last_avail_idx & (vq->num - 1);
+
+	ret = vhost_copy_from_user(vq, &desc, vq->desc_packed + i,
+				   sizeof(desc));
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			i, vq->desc_packed + i);
+		return -EFAULT;
+	}
+
+	/* Read flag after desc is read */
+	smp_mb();
+
+	return desc_is_avail(vq, &desc);
+}
+
+bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_vq_avail_empty_packed(dev, vq);
+	else
+		return vhost_vq_avail_empty_split(dev, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-/* OK, now we need to know about added descriptors. */
-bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_enable_notify_packed(struct vhost_dev *dev,
+				       struct vhost_virtqueue *vq)
+{
+	struct vring_desc_packed desc;
+	int ret, i = vq->last_avail_idx & (vq->num - 1);
+
+	/* FIXME: disable notification through device area */
+
+	ret = vhost_copy_from_user(vq, &desc, vq->desc_packed + i,
+				   sizeof(desc));
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			i, vq->desc_packed + i);
+		return -EFAULT;
+	}
+
+	/* Read flag after desc is read */
+	smp_mb();
+
+	return desc_is_avail(vq, &desc);
+}
+
+static bool vhost_enable_notify_split(struct vhost_dev *dev,
+				      struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -2318,10 +2595,25 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
 }
+
+/* OK, now we need to know about added descriptors. */
+bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_enable_notify_packed(dev, vq);
+	else
+		return vhost_enable_notify_split(dev, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
-/* We don't need to be notified again. */
-void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static void vhost_disable_notify_packed(struct vhost_dev *dev,
+					struct vhost_virtqueue *vq)
+{
+	/* FIXME: disable notification through device area */
+}
+
+static void vhost_disable_notify_split(struct vhost_dev *dev,
+				       struct vhost_virtqueue *vq)
 {
 	int r;
 
@@ -2335,6 +2627,15 @@ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 			       &vq->used->flags, r);
 	}
 }
+
+/* We don't need to be notified again. */
+void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_disable_notify_packed(dev, vq);
+	else
+		return vhost_disable_notify_split(dev, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_disable_notify);
 
 /* Create a new message. */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index ac4b605..cf6533a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -87,7 +87,10 @@ struct vhost_virtqueue {
 	/* The actual ring of buffers. */
 	struct mutex mutex;
 	unsigned int num;
-	struct vring_desc __user *desc;
+	union {
+		struct vring_desc __user *desc;
+		struct vring_desc_packed __user *desc_packed;
+	};
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
 	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
@@ -144,6 +147,7 @@ struct vhost_virtqueue {
 	bool user_be;
 #endif
 	u32 busyloop_timeout;
+	bool used_wrap_counter;
 };
 
 struct vhost_msg_node {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/2] Packed ring for vhost
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
@ 2018-02-14  2:43 ` Jason Wang
  2018-02-14  2:47 ` Michael S. Tsirkin
  3 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-02-14  2:43 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann, tiwei.bie



On 2018年02月14日 10:37, Jason Wang wrote:
> Hi all:
>
> This RFC implement a subset of packed ring which was described at
> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
> . The code were tested with pmd implement by Jens at
> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
> change was needed for pmd codes to kick virtqueue since it assumes a
> busy polling backend.
>
> Test were done between localhost and guest. Testpmd (rxonly) in guest
> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.
>
> It's not a complete implemention, here's what were missed:
>
> - Device Area
> - Driver Area
> - Descriptor indirection
> - Zerocopy may not be functional
> - Migration path is not tested
> - Vhost devices except for net
> - vIOMMU can not work (mainly because the metadata prefetch is not
>    implemented).
> - See FIXME/TODO in the codes for more details
> - No batching or other optimizations were implemented
>
> For a quick prototype, this series open code the tracking of warp
> counter and descriptor index at net device. This will be addressed in
> the future by:
>
> - Move get_rx_bufs() from net.c to vhost.c
> - Let vhost_get_vq_desc() returns vring_used_elem instad of head id
>
> With the above, we can hide the internal (at least part of) vring
> layout from specific device.
>
> Please review.
>
> Thanks

It's near spring festival in China, will probably reply after the holiday.

Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/2] Packed ring for vhost
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
                   ` (2 preceding siblings ...)
  2018-02-14  2:43 ` [PATCH RFC 0/2] Packed ring for vhost Jason Wang
@ 2018-02-14  2:47 ` Michael S. Tsirkin
  2018-02-14  3:48   ` Jason Wang
  3 siblings, 1 reply; 15+ messages in thread
From: Michael S. Tsirkin @ 2018-02-14  2:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, netdev, wexu, jfreimann, tiwei.bie

On Wed, Feb 14, 2018 at 10:37:07AM +0800, Jason Wang wrote:
> Hi all:
> 
> This RFC implement a subset of packed ring which was described at
> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
> . The code were tested with pmd implement by Jens at
> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
> change was needed for pmd codes to kick virtqueue since it assumes a
> busy polling backend.
> 
> Test were done between localhost and guest. Testpmd (rxonly) in guest
> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

How does this compare with the split ring design?

> It's not a complete implemention, here's what were missed:
> 
> - Device Area
> - Driver Area
> - Descriptor indirection
> - Zerocopy may not be functional
> - Migration path is not tested
> - Vhost devices except for net
> - vIOMMU can not work (mainly because the metadata prefetch is not
>   implemented).
> - See FIXME/TODO in the codes for more details
> - No batching or other optimizations were implemented

ioeventfd for PIO/mmio/s390.

> For a quick prototype, this series open code the tracking of warp
> counter and descriptor index at net device. This will be addressed in
> the future by:
> 
> - Move get_rx_bufs() from net.c to vhost.c
> - Let vhost_get_vq_desc() returns vring_used_elem instad of head id
> 
> With the above, we can hide the internal (at least part of) vring
> layout from specific device.
> 
> Please review.
> 
> Thanks
> 
> Jason Wang (2):
>   virtio: introduce packed ring defines
>   vhost: packed ring support
> 
>  drivers/vhost/net.c                |  14 +-
>  drivers/vhost/vhost.c              | 351 ++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.h              |   6 +-
>  include/uapi/linux/virtio_config.h |   9 +
>  include/uapi/linux/virtio_ring.h   |  17 ++
>  5 files changed, 369 insertions(+), 28 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/2] Packed ring for vhost
  2018-02-14  2:47 ` Michael S. Tsirkin
@ 2018-02-14  3:48   ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-02-14  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, netdev, wexu, jfreimann, tiwei.bie



On 2018年02月14日 10:47, Michael S. Tsirkin wrote:
> On Wed, Feb 14, 2018 at 10:37:07AM +0800, Jason Wang wrote:
>> Hi all:
>>
>> This RFC implement a subset of packed ring which was described at
>> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
>> . The code were tested with pmd implement by Jens at
>> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
>> change was needed for pmd codes to kick virtqueue since it assumes a
>> busy polling backend.
>>
>> Test were done between localhost and guest. Testpmd (rxonly) in guest
>> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.
> How does this compare with the split ring design?

No obvious difference (+-5%). I believe we reach the bottleneck of vhost.

>
>> It's not a complete implemention, here's what were missed:
>>
>> - Device Area
>> - Driver Area
>> - Descriptor indirection
>> - Zerocopy may not be functional
>> - Migration path is not tested
>> - Vhost devices except for net
>> - vIOMMU can not work (mainly because the metadata prefetch is not
>>    implemented).
>> - See FIXME/TODO in the codes for more details
>> - No batching or other optimizations were implemented
> ioeventfd for PIO/mmio/s390.
>

Probably, but this is not the stuffs of packed ring I think.

Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:17 [PATCH RFC 0/2] Packed ring for virtio Tiwei Bie
@ 2018-02-23 11:18 ` Tiwei Bie
  2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:26   ` David Laight
  0 siblings, 2 replies; 15+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:18 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: jasowang, wexu, jfreimann, tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/uapi/linux/virtio_config.h | 18 +++++++++-
 include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..e3d077ef5207 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		34
+#define VIRTIO_TRANSPORT_F_END		37
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,20 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED		34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
+/*
+ * This feature indicates that drivers pass extra data (besides
+ * identifying the Virtqueue) in their device notifications.
+ */
+#define VIRTIO_F_NOTIFICATION_DATA	36
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..77b1d4aeef72 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT	4
 
+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
+#define VRING_DESC_F_USED(b)	((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -104,6 +107,36 @@ struct vring {
 	struct vring_used *used;
 };
 
+struct vring_packed_desc_event {
+	/* Descriptor Event Offset */
+	__virtio16 desc_event_off   : 15,
+	/* Descriptor Event Wrap Counter */
+		   desc_event_wrap  : 1;
+	/* Descriptor Event Flags */
+	__virtio16 desc_event_flags : 2;
+};
+
+struct vring_packed_desc {
+	/* Buffer Address. */
+	__virtio64 addr;
+	/* Buffer Length. */
+	__virtio32 len;
+	/* Buffer ID. */
+	__virtio16 id;
+	/* The flags depending on descriptor type. */
+	__virtio16 flags;
+};
+
+struct vring_packed {
+	unsigned int num;
+
+	struct vring_packed_desc *desc;
+
+	struct vring_packed_desc_event *driver;
+
+	struct vring_packed_desc_event *device;
+};
+
 /* Alignment requirements for vring elements.
  * When using pre-virtio 1.0 layout, these fall out naturally.
  */
@@ -171,4 +204,39 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
 	return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+/* The standard layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed
+ * {
+ *	// The actual descriptors (16 bytes each)
+ *	struct vring_packed_desc desc[num];
+ *
+ *	// Padding to the next align boundary.
+ *	char pad[];
+ *
+ *	// Driver Event Suppression
+ *	struct vring_packed_desc_event driver;
+ *
+ *	// Device Event Suppression
+ *	struct vring_packed_desc_event device;
+ * };
+ */
+
+static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
+				     void *p, unsigned long align)
+{
+	vr->num = num;
+	vr->desc = p;
+	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
+		* num + align - 1) & ~(align - 1));
+	vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_packed_size(unsigned int num, unsigned long align)
+{
+	return ((sizeof(struct vring_packed_desc) * num + align - 1)
+		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:18 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Tiwei Bie
@ 2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
                       ` (2 more replies)
  2018-02-27  9:26   ` David Laight
  1 sibling, 3 replies; 15+ messages in thread
From: Jens Freimann @ 2018-02-27  8:54 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu

On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
>Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>---
> include/uapi/linux/virtio_config.h | 18 +++++++++-
> include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 1 deletion(-)
>
>diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>index 308e2096291f..e3d077ef5207 100644
>--- a/include/uapi/linux/virtio_config.h
>+++ b/include/uapi/linux/virtio_config.h
>@@ -49,7 +49,7 @@
>  * transport being used (eg. virtio_ring), the rest are per-device feature
>  * bits. */
> #define VIRTIO_TRANSPORT_F_START	28
>-#define VIRTIO_TRANSPORT_F_END		34
>+#define VIRTIO_TRANSPORT_F_END		37
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
>@@ -71,4 +71,20 @@
>  * this is for compatibility with legacy systems.
>  */
> #define VIRTIO_F_IOMMU_PLATFORM		33
>+
>+/* This feature indicates support for the packed virtqueue layout. */
>+#define VIRTIO_F_RING_PACKED		34

Spec says VIRTIO_F_PACKED_RING not RING_PACKED
>+
>+/*
>+ * This feature indicates that all buffers are used by the device
>+ * in the same order in which they have been made available.
>+ */
>+#define VIRTIO_F_IN_ORDER		35
>+
>+/*
>+ * This feature indicates that drivers pass extra data (besides
>+ * identifying the Virtqueue) in their device notifications.
>+ */
>+#define VIRTIO_F_NOTIFICATION_DATA	36
>+
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
>diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
>index 6d5d5faa989b..77b1d4aeef72 100644
>--- a/include/uapi/linux/virtio_ring.h
>+++ b/include/uapi/linux/virtio_ring.h
>@@ -44,6 +44,9 @@
> /* This means the buffer contains a list of buffer descriptors. */
> #define VRING_DESC_F_INDIRECT	4
>
>+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
>+#define VRING_DESC_F_USED(b)	((b) << 15)
>+
> /* The Host uses this in used->flags to advise the Guest: don't kick me when
>  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
>  * will still kick if it's out of buffers. */
>@@ -104,6 +107,36 @@ struct vring {
> 	struct vring_used *used;
> };
>
>+struct vring_packed_desc_event {
>+	/* Descriptor Event Offset */
>+	__virtio16 desc_event_off   : 15,
>+	/* Descriptor Event Wrap Counter */
>+		   desc_event_wrap  : 1;
>+	/* Descriptor Event Flags */
>+	__virtio16 desc_event_flags : 2;
>+};

Where would the virtqueue number go in driver notifications?

regards,
Jens 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/2] vhost: packed ring support
  2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
@ 2018-02-27  9:03   ` Tiwei Bie
  2018-02-28  2:38     ` Jason Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Tiwei Bie @ 2018-02-27  9:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Wed, Feb 14, 2018 at 10:37:09AM +0800, Jason Wang wrote:
[...]
> +static void set_desc_used(struct vhost_virtqueue *vq,
> +			  struct vring_desc_packed *desc, bool wrap_counter)
> +{
> +	__virtio16 flags = desc->flags;
> +
> +	if (wrap_counter) {
> +		desc->flags |= cpu_to_vhost16(vq, DESC_AVAIL);
> +		desc->flags |= cpu_to_vhost16(vq, DESC_USED);
> +	} else {
> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_AVAIL);
> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_USED);
> +	}
> +
> +	desc->flags = flags;

The `desc->flags` is restored after the change.

> +}
> +
> +static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
> +				    struct iovec iov[], unsigned int iov_size,
> +				    unsigned int *out_num, unsigned int *in_num,
> +				    struct vhost_log *log,
> +				    unsigned int *log_num)
> +{
> +	struct vring_desc_packed desc;
> +	int ret, access, i;
> +	u16 avail_idx = vq->last_avail_idx;
> +
> +	/* When we start there are none of either input nor output. */
> +	*out_num = *in_num = 0;
> +	if (unlikely(log))
> +		*log_num = 0;
> +
> +	do {
> +		unsigned int iov_count = *in_num + *out_num;
> +
> +		i = vq->last_avail_idx & (vq->num - 1);

The queue size may not be a power of 2 in packed ring.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
@ 2018-02-27  9:18     ` Jens Freimann
  2018-02-27 12:01     ` Tiwei Bie
  2018-02-27 20:28     ` Michael S. Tsirkin
  2 siblings, 0 replies; 15+ messages in thread
From: Jens Freimann @ 2018-02-27  9:18 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
>On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
>>Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>---
>>include/uapi/linux/virtio_config.h | 18 +++++++++-
>>include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
>>2 files changed, 85 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>>index 308e2096291f..e3d077ef5207 100644
>>--- a/include/uapi/linux/virtio_config.h
>>+++ b/include/uapi/linux/virtio_config.h
>>@@ -49,7 +49,7 @@
>> * transport being used (eg. virtio_ring), the rest are per-device feature
>> * bits. */
>>#define VIRTIO_TRANSPORT_F_START	28
>>-#define VIRTIO_TRANSPORT_F_END		34
>>+#define VIRTIO_TRANSPORT_F_END		37
>>
>>#ifndef VIRTIO_CONFIG_NO_LEGACY
>>/* Do we get callbacks when the ring is completely used, even if we've
>>@@ -71,4 +71,20 @@
>> * this is for compatibility with legacy systems.
>> */
>>#define VIRTIO_F_IOMMU_PLATFORM		33
>>+
>>+/* This feature indicates support for the packed virtqueue layout. */
>>+#define VIRTIO_F_RING_PACKED		34
>
>Spec says VIRTIO_F_PACKED_RING not RING_PACKED

Ignore this. Seems to have changed.

regards,
Jens 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:18 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Tiwei Bie
  2018-02-27  8:54   ` Jens Freimann
@ 2018-02-27  9:26   ` David Laight
  2018-02-27 11:31     ` Tiwei Bie
  1 sibling, 1 reply; 15+ messages in thread
From: David Laight @ 2018-02-27  9:26 UTC (permalink / raw)
  To: 'Tiwei Bie', mst@redhat.com,
	virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
  Cc: jasowang@redhat.com, wexu@redhat.com, jfreimann@redhat.com

From: Tiwei Bie
> Sent: 23 February 2018 11:18
...
> +struct vring_packed_desc_event {
> +	/* Descriptor Event Offset */
> +	__virtio16 desc_event_off   : 15,
> +	/* Descriptor Event Wrap Counter */
> +		   desc_event_wrap  : 1;
> +	/* Descriptor Event Flags */
> +	__virtio16 desc_event_flags : 2;
> +};

This looks like you are assuming that a bit-field has a defined
layout and can be used to map a 'hardware' structure.
The don't, don't use them like that.

	David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  9:26   ` David Laight
@ 2018-02-27 11:31     ` Tiwei Bie
  0 siblings, 0 replies; 15+ messages in thread
From: Tiwei Bie @ 2018-02-27 11:31 UTC (permalink / raw)
  To: David Laight
  Cc: mst@redhat.com, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	jasowang@redhat.com, wexu@redhat.com, jfreimann@redhat.com

On Tue, Feb 27, 2018 at 09:26:27AM +0000, David Laight wrote:
> From: Tiwei Bie
> > Sent: 23 February 2018 11:18
> ...
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> This looks like you are assuming that a bit-field has a defined
> layout and can be used to map a 'hardware' structure.
> The don't, don't use them like that.
> 
> 	David
> 

Thanks for the comments! Above definition isn't used in
this RFC, and the corresponding parts (event suppression)
haven't been implemented yet. It's more like some pseudo
code (I should add some comments about this in the code).

I planned to change it to something like this in the next
version:

struct vring_packed_desc_event {
	__virtio16 off_wrap;
	__virtio16 flags;  // XXX maybe not a good name for future
};                         // extension. Only 2bits are used now.

But it seems that I had a misunderstanding about the spec
on this previously:

https://lists.oasis-open.org/archives/virtio-dev/201802/msg00173.html

Anyway, it will be addressed. Thank you very much! ;-)

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
@ 2018-02-27 12:01     ` Tiwei Bie
  2018-02-27 20:28     ` Michael S. Tsirkin
  2 siblings, 0 replies; 15+ messages in thread
From: Tiwei Bie @ 2018-02-27 12:01 UTC (permalink / raw)
  To: Jens Freimann; +Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
> On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
[...]
> > 
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> Where would the virtqueue number go in driver notifications?

This structure is for event suppression instead of notification.

You could refer to the "Event Suppression Structure Format"
section of the spec for more details:

https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd08.pdf

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
  2018-02-27 12:01     ` Tiwei Bie
@ 2018-02-27 20:28     ` Michael S. Tsirkin
  2 siblings, 0 replies; 15+ messages in thread
From: Michael S. Tsirkin @ 2018-02-27 20:28 UTC (permalink / raw)
  To: Jens Freimann
  Cc: Tiwei Bie, virtualization, linux-kernel, netdev, jasowang, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
> On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > include/uapi/linux/virtio_config.h | 18 +++++++++-
> > include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 85 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > index 308e2096291f..e3d077ef5207 100644
> > --- a/include/uapi/linux/virtio_config.h
> > +++ b/include/uapi/linux/virtio_config.h
> > @@ -49,7 +49,7 @@
> >  * transport being used (eg. virtio_ring), the rest are per-device feature
> >  * bits. */
> > #define VIRTIO_TRANSPORT_F_START	28
> > -#define VIRTIO_TRANSPORT_F_END		34
> > +#define VIRTIO_TRANSPORT_F_END		37
> > 
> > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > /* Do we get callbacks when the ring is completely used, even if we've
> > @@ -71,4 +71,20 @@
> >  * this is for compatibility with legacy systems.
> >  */
> > #define VIRTIO_F_IOMMU_PLATFORM		33
> > +
> > +/* This feature indicates support for the packed virtqueue layout. */
> > +#define VIRTIO_F_RING_PACKED		34
> 
> Spec says VIRTIO_F_PACKED_RING not RING_PACKED

I changed that now :)

> > +
> > +/*
> > + * This feature indicates that all buffers are used by the device
> > + * in the same order in which they have been made available.
> > + */
> > +#define VIRTIO_F_IN_ORDER		35
> > +
> > +/*
> > + * This feature indicates that drivers pass extra data (besides
> > + * identifying the Virtqueue) in their device notifications.
> > + */
> > +#define VIRTIO_F_NOTIFICATION_DATA	36
> > +
> > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> > index 6d5d5faa989b..77b1d4aeef72 100644
> > --- a/include/uapi/linux/virtio_ring.h
> > +++ b/include/uapi/linux/virtio_ring.h
> > @@ -44,6 +44,9 @@
> > /* This means the buffer contains a list of buffer descriptors. */
> > #define VRING_DESC_F_INDIRECT	4
> > 
> > +#define VRING_DESC_F_AVAIL(b)	((b) << 7)
> > +#define VRING_DESC_F_USED(b)	((b) << 15)
> > +
> > /* The Host uses this in used->flags to advise the Guest: don't kick me when
> >  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
> >  * will still kick if it's out of buffers. */
> > @@ -104,6 +107,36 @@ struct vring {
> > 	struct vring_used *used;
> > };
> > 
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> Where would the virtqueue number go in driver notifications?
> 
> regards,
> Jens

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/2] vhost: packed ring support
  2018-02-27  9:03   ` Tiwei Bie
@ 2018-02-28  2:38     ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2018-02-28  2:38 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann



On 2018年02月27日 17:03, Tiwei Bie wrote:
> On Wed, Feb 14, 2018 at 10:37:09AM +0800, Jason Wang wrote:
> [...]
>> +static void set_desc_used(struct vhost_virtqueue *vq,
>> +			  struct vring_desc_packed *desc, bool wrap_counter)
>> +{
>> +	__virtio16 flags = desc->flags;
>> +
>> +	if (wrap_counter) {
>> +		desc->flags |= cpu_to_vhost16(vq, DESC_AVAIL);
>> +		desc->flags |= cpu_to_vhost16(vq, DESC_USED);
>> +	} else {
>> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_AVAIL);
>> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_USED);
>> +	}
>> +
>> +	desc->flags = flags;
> The `desc->flags` is restored after the change.

Right, will fix.

>> +}
>> +
>> +static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
>> +				    struct iovec iov[], unsigned int iov_size,
>> +				    unsigned int *out_num, unsigned int *in_num,
>> +				    struct vhost_log *log,
>> +				    unsigned int *log_num)
>> +{
>> +	struct vring_desc_packed desc;
>> +	int ret, access, i;
>> +	u16 avail_idx = vq->last_avail_idx;
>> +
>> +	/* When we start there are none of either input nor output. */
>> +	*out_num = *in_num = 0;
>> +	if (unlikely(log))
>> +		*log_num = 0;
>> +
>> +	do {
>> +		unsigned int iov_count = *in_num + *out_num;
>> +
>> +		i = vq->last_avail_idx & (vq->num - 1);
> The queue size may not be a power of 2 in packed ring.

Will fix this too.

Thanks

> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-02-28  2:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
2018-02-27  9:03   ` Tiwei Bie
2018-02-28  2:38     ` Jason Wang
2018-02-14  2:43 ` [PATCH RFC 0/2] Packed ring for vhost Jason Wang
2018-02-14  2:47 ` Michael S. Tsirkin
2018-02-14  3:48   ` Jason Wang
  -- strict thread matches above, loose matches on Subject: below --
2018-02-23 11:17 [PATCH RFC 0/2] Packed ring for virtio Tiwei Bie
2018-02-23 11:18 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Tiwei Bie
2018-02-27  8:54   ` Jens Freimann
2018-02-27  9:18     ` Jens Freimann
2018-02-27 12:01     ` Tiwei Bie
2018-02-27 20:28     ` Michael S. Tsirkin
2018-02-27  9:26   ` David Laight
2018-02-27 11:31     ` Tiwei Bie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).