[PATCH V3 00/19] virtio_ring in order support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 00/19] virtio_ring in order support
@ 2025-06-16  8:24 Jason Wang
  2025-06-16  8:24 ` [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx() Jason Wang
                   ` (20 more replies)
  0 siblings, 21 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:24 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Hello all:

This sereis tries to implement the VIRTIO_F_IN_ORDER to
virtio_ring. This is done by introducing virtqueue ops so we can
implement separate helpers for different virtqueue layout/features
then the in-order were implemented on top.

Tests shows 3%-5% imporvment with packed virtqueue PPS with KVM guest
testpmd on the host.

Changes since V2:

- Fix build warning when DEBUG is enabled

Changes since V1:

- use const global array of function pointers to avoid indirect
  branches to eliminate retpoline when mitigation is enabled
- fix used length calculation when processing used ids in a batch
- fix sparse warnings

Please review.

Thanks

Jason Wang (19):
  virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
  virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants
  virtio_ring: unify logic of virtqueue_poll() and more_used()
  virtio_ring: switch to use vring_virtqueue for virtqueue resize
    variants
  virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare
    variants
  virtio_ring: switch to use vring_virtqueue for virtqueue_add variants
  virtio: switch to use vring_virtqueue for virtqueue_add variants
  virtio_ring: switch to use vring_virtqueue for enable_cb_prepare
    variants
  virtio_ring: use vring_virtqueue for enable_cb_delayed variants
  virtio_ring: switch to use vring_virtqueue for disable_cb variants
  virtio_ring: switch to use vring_virtqueue for detach_unused_buf
    variants
  virtio_ring: use u16 for last_used_idx in virtqueue_poll_split()
  virtio_ring: introduce virtqueue ops
  virtio_ring: determine descriptor flags at one time
  virtio_ring: factor out core logic of buffer detaching
  virtio_ring: factor out core logic for updating last_used_idx
  virtio_ring: factor out split indirect detaching logic
  virtio_ring: factor out split detaching logic
  virtio_ring: add in order support

 drivers/virtio/virtio_ring.c | 896 ++++++++++++++++++++++++++---------
 1 file changed, 684 insertions(+), 212 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
@ 2025-06-16  8:24 ` Jason Wang
  2025-06-24 10:35   ` Lei Yang
  2025-06-16  8:25 ` [PATCH V3 02/19] virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants Jason Wang
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:24 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

To be consistent with virtqueue_reset().

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b784aab66867..afdd51fc3c9c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1005,7 +1005,7 @@ static void virtqueue_vring_init_split(struct vring_virtqueue_split *vring_split
 	}
 }
 
-static void virtqueue_reinit_split(struct vring_virtqueue *vq)
+static void virtqueue_reset_split(struct vring_virtqueue *vq)
 {
 	int num;
 
@@ -1248,7 +1248,7 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
 err_state_extra:
 	vring_free_split(&vring_split, vdev, vring_dma_dev(vq));
 err:
-	virtqueue_reinit_split(vq);
+	virtqueue_reset_split(vq);
 	return -ENOMEM;
 }
 
@@ -2092,7 +2092,7 @@ static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq,
 	vq->free_head = 0;
 }
 
-static void virtqueue_reinit_packed(struct vring_virtqueue *vq)
+static void virtqueue_reset_packed(struct vring_virtqueue *vq)
 {
 	memset(vq->packed.vring.device, 0, vq->packed.event_size_in_bytes);
 	memset(vq->packed.vring.driver, 0, vq->packed.event_size_in_bytes);
@@ -2219,7 +2219,7 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
 err_state_extra:
 	vring_free_packed(&vring_packed, vdev, vring_dma_dev(vq));
 err_ring:
-	virtqueue_reinit_packed(vq);
+	virtqueue_reset_packed(vq);
 	return -ENOMEM;
 }
 
@@ -2852,9 +2852,9 @@ int virtqueue_reset(struct virtqueue *_vq,
 		recycle_done(_vq);
 
 	if (vq->packed_ring)
-		virtqueue_reinit_packed(vq);
+		virtqueue_reset_packed(vq);
 	else
-		virtqueue_reinit_split(vq);
+		virtqueue_reset_split(vq);
 
 	return virtqueue_enable_after_reset(_vq);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 02/19] virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
  2025-06-16  8:24 ` [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx() Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 03/19] virtio_ring: unify logic of virtqueue_poll() and more_used() Jason Wang
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index afdd51fc3c9c..9bc6c30458b5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -915,11 +915,10 @@ static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
 	return last_used_idx;
 }
 
-static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned int last_used_idx)
+static bool virtqueue_poll_split(struct vring_virtqueue *vq,
+				 unsigned int last_used_idx)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
-
-	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev,
+	return (u16)last_used_idx != virtio16_to_cpu(vq->vq.vdev,
 			vq->split.vring.used->idx);
 }
 
@@ -1845,9 +1844,8 @@ static unsigned int virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 	return vq->last_used_idx;
 }
 
-static bool virtqueue_poll_packed(struct virtqueue *_vq, u16 off_wrap)
+static bool virtqueue_poll_packed(struct vring_virtqueue *vq, u16 off_wrap)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	bool wrap_counter;
 	u16 used_idx;
 
@@ -2608,8 +2606,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned int last_used_idx)
 		return false;
 
 	virtio_mb(vq->weak_barriers);
-	return vq->packed_ring ? virtqueue_poll_packed(_vq, last_used_idx) :
-				 virtqueue_poll_split(_vq, last_used_idx);
+	return vq->packed_ring ? virtqueue_poll_packed(vq, last_used_idx) :
+				 virtqueue_poll_split(vq, last_used_idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_poll);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 03/19] virtio_ring: unify logic of virtqueue_poll() and more_used()
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
  2025-06-16  8:24 ` [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx() Jason Wang
  2025-06-16  8:25 ` [PATCH V3 02/19] virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 04/19] virtio_ring: switch to use vring_virtqueue for virtqueue resize variants Jason Wang
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

This patch unifies the logic of virtqueue_poll() and more_used() for
better code reusing and ease the future in order implementation.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 48 +++++++++++++++---------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9bc6c30458b5..633379b2dc42 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -802,12 +802,18 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	}
 }
 
-static bool more_used_split(const struct vring_virtqueue *vq)
+static bool virtqueue_poll_split(const struct vring_virtqueue *vq,
+				 unsigned int last_used_idx)
 {
-	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev,
+	return (u16)last_used_idx != virtio16_to_cpu(vq->vq.vdev,
 			vq->split.vring.used->idx);
 }
 
+static bool more_used_split(const struct vring_virtqueue *vq)
+{
+	return virtqueue_poll_split(vq, vq->last_used_idx);
+}
+
 static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 					 unsigned int *len,
 					 void **ctx)
@@ -915,13 +921,6 @@ static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
 	return last_used_idx;
 }
 
-static bool virtqueue_poll_split(struct vring_virtqueue *vq,
-				 unsigned int last_used_idx)
-{
-	return (u16)last_used_idx != virtio16_to_cpu(vq->vq.vdev,
-			vq->split.vring.used->idx);
-}
-
 static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1711,16 +1710,20 @@ static inline bool is_used_desc_packed(const struct vring_virtqueue *vq,
 	return avail == used && used == used_wrap_counter;
 }
 
-static bool more_used_packed(const struct vring_virtqueue *vq)
+static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
 {
-	u16 last_used;
-	u16 last_used_idx;
-	bool used_wrap_counter;
+	bool wrap_counter;
+	u16 used_idx;
 
-	last_used_idx = READ_ONCE(vq->last_used_idx);
-	last_used = packed_last_used(last_used_idx);
-	used_wrap_counter = packed_used_wrap_counter(last_used_idx);
-	return is_used_desc_packed(vq, last_used, used_wrap_counter);
+	wrap_counter = off_wrap >> VRING_PACKED_EVENT_F_WRAP_CTR;
+	used_idx = off_wrap & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR);
+
+	return is_used_desc_packed(vq, used_idx, wrap_counter);
+}
+
+static bool more_used_packed(const struct vring_virtqueue *vq)
+{
+	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
 }
 
 static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
@@ -1844,17 +1847,6 @@ static unsigned int virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
 	return vq->last_used_idx;
 }
 
-static bool virtqueue_poll_packed(struct vring_virtqueue *vq, u16 off_wrap)
-{
-	bool wrap_counter;
-	u16 used_idx;
-
-	wrap_counter = off_wrap >> VRING_PACKED_EVENT_F_WRAP_CTR;
-	used_idx = off_wrap & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR);
-
-	return is_used_desc_packed(vq, used_idx, wrap_counter);
-}
-
 static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 04/19] virtio_ring: switch to use vring_virtqueue for virtqueue resize variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (2 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 03/19] virtio_ring: unify logic of virtqueue_poll() and more_used() Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 05/19] virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare variants Jason Wang
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 633379b2dc42..e7be480bee67 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1216,11 +1216,10 @@ static struct virtqueue *vring_create_virtqueue_split(
 	return vq;
 }
 
-static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
+static int virtqueue_resize_split(struct vring_virtqueue *vq, u32 num)
 {
 	struct vring_virtqueue_split vring_split = {};
-	struct vring_virtqueue *vq = to_vvq(_vq);
-	struct virtio_device *vdev = _vq->vdev;
+	struct virtio_device *vdev = vq->vq.vdev;
 	int err;
 
 	err = vring_alloc_queue_split(&vring_split, vdev, num,
@@ -2183,11 +2182,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	return vq;
 }
 
-static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
+static int virtqueue_resize_packed(struct vring_virtqueue *vq, u32 num)
 {
 	struct vring_virtqueue_packed vring_packed = {};
-	struct vring_virtqueue *vq = to_vvq(_vq);
-	struct virtio_device *vdev = _vq->vdev;
+	struct virtio_device *vdev = vq->vq.vdev;
 	int err;
 
 	if (vring_alloc_queue_packed(&vring_packed, vdev, num, vring_dma_dev(vq)))
@@ -2805,9 +2803,9 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 		recycle_done(_vq);
 
 	if (vq->packed_ring)
-		err = virtqueue_resize_packed(_vq, num);
+		err = virtqueue_resize_packed(vq, num);
 	else
-		err = virtqueue_resize_split(_vq, num);
+		err = virtqueue_resize_split(vq, num);
 
 	return virtqueue_enable_after_reset(_vq);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 05/19] virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (3 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 04/19] virtio_ring: switch to use vring_virtqueue for virtqueue resize variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 06/19] virtio_ring: switch to use vring_virtqueue for virtqueue_add variants Jason Wang
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e7be480bee67..8ee5393803f0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -713,9 +713,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	return -ENOMEM;
 }
 
-static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
+static bool virtqueue_kick_prepare_split(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 new, old;
 	bool needs_kick;
 
@@ -732,12 +731,12 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 	LAST_ADD_TIME_INVALID(vq);
 
 	if (vq->event) {
-		needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev,
+		needs_kick = vring_need_event(virtio16_to_cpu(vq->vq.vdev,
 					vring_avail_event(&vq->split.vring)),
 					      new, old);
 	} else {
 		needs_kick = !(vq->split.vring.used->flags &
-					cpu_to_virtio16(_vq->vdev,
+					cpu_to_virtio16(vq->vq.vdev,
 						VRING_USED_F_NO_NOTIFY));
 	}
 	END_USE(vq);
@@ -1597,9 +1596,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	return -EIO;
 }
 
-static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
+static bool virtqueue_kick_prepare_packed(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 new, old, off_wrap, flags, wrap_counter, event_idx;
 	bool needs_kick;
 	union {
@@ -2454,8 +2452,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_kick_prepare_packed(_vq) :
-				 virtqueue_kick_prepare_split(_vq);
+	return vq->packed_ring ? virtqueue_kick_prepare_packed(vq) :
+				 virtqueue_kick_prepare_split(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 06/19] virtio_ring: switch to use vring_virtqueue for virtqueue_add variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (4 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 05/19] virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 07/19] virtio: " Jason Wang
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 40 +++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 8ee5393803f0..a808792ca86b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -472,7 +472,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 	return extra->next;
 }
 
-static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
 					       unsigned int total_sg,
 					       gfp_t gfp)
 {
@@ -501,7 +501,7 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	return desc;
 }
 
-static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
+static inline unsigned int virtqueue_add_desc_split(struct vring_virtqueue *vq,
 						    struct vring_desc *desc,
 						    struct vring_desc_extra *extra,
 						    unsigned int i,
@@ -509,11 +509,12 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 						    unsigned int len,
 						    u16 flags, bool premapped)
 {
+	struct virtio_device *vdev = vq->vq.vdev;
 	u16 next;
 
-	desc[i].flags = cpu_to_virtio16(vq->vdev, flags);
-	desc[i].addr = cpu_to_virtio64(vq->vdev, addr);
-	desc[i].len = cpu_to_virtio32(vq->vdev, len);
+	desc[i].flags = cpu_to_virtio16(vdev, flags);
+	desc[i].addr = cpu_to_virtio64(vdev, addr);
+	desc[i].len = cpu_to_virtio32(vdev, len);
 
 	extra[i].addr = premapped ? DMA_MAPPING_ERROR : addr;
 	extra[i].len = len;
@@ -521,12 +522,12 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 
 	next = extra[i].next;
 
-	desc[i].next = cpu_to_virtio16(vq->vdev, next);
+	desc[i].next = cpu_to_virtio16(vdev, next);
 
 	return next;
 }
 
-static inline int virtqueue_add_split(struct virtqueue *_vq,
+static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 				      struct scatterlist *sgs[],
 				      unsigned int total_sg,
 				      unsigned int out_sgs,
@@ -536,7 +537,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      bool premapped,
 				      gfp_t gfp)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct vring_desc_extra *extra;
 	struct scatterlist *sg;
 	struct vring_desc *desc;
@@ -561,7 +561,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	head = vq->free_head;
 
 	if (virtqueue_use_indirect(vq, total_sg))
-		desc = alloc_indirect_split(_vq, total_sg, gfp);
+		desc = alloc_indirect_split(vq, total_sg, gfp);
 	else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
@@ -608,7 +608,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			/* Note that we trust indirect descriptor
 			 * table since it use stream DMA mapping.
 			 */
-			i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, len,
+			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
 						     VRING_DESC_F_NEXT,
 						     premapped);
 		}
@@ -625,14 +625,14 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			/* Note that we trust indirect descriptor
 			 * table since it use stream DMA mapping.
 			 */
-			i = virtqueue_add_desc_split(_vq, desc, extra, i, addr, len,
+			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
 						     VRING_DESC_F_NEXT |
 						     VRING_DESC_F_WRITE,
 						     premapped);
 		}
 	}
 	/* Last one doesn't continue. */
-	desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+	desc[prev].flags &= cpu_to_virtio16(vq->vq.vdev, ~VRING_DESC_F_NEXT);
 	if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
 		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
 			~VRING_DESC_F_NEXT;
@@ -645,7 +645,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		if (vring_mapping_error(vq, addr))
 			goto unmap_release;
 
-		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
+		virtqueue_add_desc_split(vq, vq->split.vring.desc,
 					 vq->split.desc_extra,
 					 head, addr,
 					 total_sg * sizeof(struct vring_desc),
@@ -671,13 +671,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
 	avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
-	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
+	vq->split.vring.avail->ring[avail] = cpu_to_virtio16(vq->vq.vdev, head);
 
 	/* Descriptors and available array need to be set before we expose the
 	 * new available array entries. */
 	virtio_wmb(vq->weak_barriers);
 	vq->split.avail_idx_shadow++;
-	vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
+	vq->split.vring.avail->idx = cpu_to_virtio16(vq->vq.vdev,
 						vq->split.avail_idx_shadow);
 	vq->num_added++;
 
@@ -687,7 +687,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* This is very unlikely, but theoretically possible.  Kick
 	 * just in case. */
 	if (unlikely(vq->num_added == (1 << 16) - 1))
-		virtqueue_kick(_vq);
+		virtqueue_kick(&vq->vq);
 
 	return 0;
 
@@ -702,7 +702,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
-
 		i = vring_unmap_one_split(vq, &extra[i]);
 	}
 
@@ -1441,7 +1440,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	return -ENOMEM;
 }
 
-static inline int virtqueue_add_packed(struct virtqueue *_vq,
+static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
 				       struct scatterlist *sgs[],
 				       unsigned int total_sg,
 				       unsigned int out_sgs,
@@ -1451,7 +1450,6 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 				       bool premapped,
 				       gfp_t gfp)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct vring_packed_desc *desc;
 	struct scatterlist *sg;
 	unsigned int i, n, c, descs_used, err_idx, len;
@@ -2263,9 +2261,9 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
+	return vq->packed_ring ? virtqueue_add_packed(vq, sgs, total_sg,
 					out_sgs, in_sgs, data, ctx, premapped, gfp) :
-				 virtqueue_add_split(_vq, sgs, total_sg,
+				 virtqueue_add_split(vq, sgs, total_sg,
 					out_sgs, in_sgs, data, ctx, premapped, gfp);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 07/19] virtio: switch to use vring_virtqueue for virtqueue_add variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (5 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 06/19] virtio_ring: switch to use vring_virtqueue for virtqueue_add variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 08/19] virtio_ring: switch to use vring_virtqueue for enable_cb_prepare variants Jason Wang
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a808792ca86b..4e70b7eb127f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -812,11 +812,10 @@ static bool more_used_split(const struct vring_virtqueue *vq)
 	return virtqueue_poll_split(vq, vq->last_used_idx);
 }
 
-static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
+static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq,
 					 unsigned int *len,
 					 void **ctx)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
 	unsigned int i;
 	u16 last_used;
@@ -838,9 +837,9 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 	virtio_rmb(vq->weak_barriers);
 
 	last_used = (vq->last_used_idx & (vq->split.vring.num - 1));
-	i = virtio32_to_cpu(_vq->vdev,
+	i = virtio32_to_cpu(vq->vq.vdev,
 			vq->split.vring.used->ring[last_used].id);
-	*len = virtio32_to_cpu(_vq->vdev,
+	*len = virtio32_to_cpu(vq->vq.vdev,
 			vq->split.vring.used->ring[last_used].len);
 
 	if (unlikely(i >= vq->split.vring.num)) {
@@ -862,7 +861,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
 		virtio_store_mb(vq->weak_barriers,
 				&vring_used_event(&vq->split.vring),
-				cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
+				cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx));
 
 	LAST_ADD_TIME_INVALID(vq);
 
@@ -1721,11 +1720,10 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
 	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
 }
 
-static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
+static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 					  unsigned int *len,
 					  void **ctx)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 last_used, id, last_used_idx;
 	bool used_wrap_counter;
 	void *ret;
@@ -2521,8 +2519,8 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
-				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(vq, len, ctx) :
+				 virtqueue_get_buf_ctx_split(vq, len, ctx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 08/19] virtio_ring: switch to use vring_virtqueue for enable_cb_prepare variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (6 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 07/19] virtio: " Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 09/19] virtio_ring: use vring_virtqueue for enable_cb_delayed variants Jason Wang
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4e70b7eb127f..8c18f3edda00 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -893,9 +893,8 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
 	}
 }
 
-static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
+static unsigned int virtqueue_enable_cb_prepare_split(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 last_used_idx;
 
 	START_USE(vq);
@@ -909,10 +908,10 @@ static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
 		vq->split.avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
 		if (!vq->event)
 			vq->split.vring.avail->flags =
-				cpu_to_virtio16(_vq->vdev,
+				cpu_to_virtio16(vq->vq.vdev,
 						vq->split.avail_flags_shadow);
 	}
-	vring_used_event(&vq->split.vring) = cpu_to_virtio16(_vq->vdev,
+	vring_used_event(&vq->split.vring) = cpu_to_virtio16(vq->vq.vdev,
 			last_used_idx = vq->last_used_idx);
 	END_USE(vq);
 	return last_used_idx;
@@ -1807,10 +1806,8 @@ static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
 	}
 }
 
-static unsigned int virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
+static unsigned int virtqueue_enable_cb_prepare_packed(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
-
 	START_USE(vq);
 
 	/*
@@ -2568,8 +2565,8 @@ unsigned int virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 	if (vq->event_triggered)
 		vq->event_triggered = false;
 
-	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
-				 virtqueue_enable_cb_prepare_split(_vq);
+	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(vq) :
+				 virtqueue_enable_cb_prepare_split(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 09/19] virtio_ring: use vring_virtqueue for enable_cb_delayed variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (7 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 08/19] virtio_ring: switch to use vring_virtqueue for enable_cb_prepare variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 10/19] virtio_ring: switch to use vring_virtqueue for disable_cb variants Jason Wang
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 8c18f3edda00..a85db8c8cb24 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -917,9 +917,8 @@ static unsigned int virtqueue_enable_cb_prepare_split(struct vring_virtqueue *vq
 	return last_used_idx;
 }
 
-static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
+static bool virtqueue_enable_cb_delayed_split(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 bufs;
 
 	START_USE(vq);
@@ -933,7 +932,7 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
 		vq->split.avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
 		if (!vq->event)
 			vq->split.vring.avail->flags =
-				cpu_to_virtio16(_vq->vdev,
+				cpu_to_virtio16(vq->vq.vdev,
 						vq->split.avail_flags_shadow);
 	}
 	/* TODO: tune this threshold */
@@ -941,9 +940,9 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
 
 	virtio_store_mb(vq->weak_barriers,
 			&vring_used_event(&vq->split.vring),
-			cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
+			cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx + bufs));
 
-	if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->split.vring.used->idx)
+	if (unlikely((u16)(virtio16_to_cpu(vq->vq.vdev, vq->split.vring.used->idx)
 					- vq->last_used_idx) > bufs)) {
 		END_USE(vq);
 		return false;
@@ -1837,9 +1836,8 @@ static unsigned int virtqueue_enable_cb_prepare_packed(struct vring_virtqueue *v
 	return vq->last_used_idx;
 }
 
-static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
+static bool virtqueue_enable_cb_delayed_packed(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 used_idx, wrap_counter, last_used_idx;
 	u16 bufs;
 
@@ -2631,8 +2629,8 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	if (vq->event_triggered)
 		data_race(vq->event_triggered = false);
 
-	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
-				 virtqueue_enable_cb_delayed_split(_vq);
+	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(vq) :
+				 virtqueue_enable_cb_delayed_split(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 10/19] virtio_ring: switch to use vring_virtqueue for disable_cb variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (8 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 09/19] virtio_ring: use vring_virtqueue for enable_cb_delayed variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 11/19] virtio_ring: switch to use vring_virtqueue for detach_unused_buf variants Jason Wang
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a85db8c8cb24..4af29a6e15e8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -869,10 +869,8 @@ static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq,
 	return ret;
 }
 
-static void virtqueue_disable_cb_split(struct virtqueue *_vq)
+static void virtqueue_disable_cb_split(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
-
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
 
@@ -888,7 +886,7 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
 			vring_used_event(&vq->split.vring) = 0x0;
 		else
 			vq->split.vring.avail->flags =
-				cpu_to_virtio16(_vq->vdev,
+				cpu_to_virtio16(vq->vq.vdev,
 						vq->split.avail_flags_shadow);
 	}
 }
@@ -1786,10 +1784,8 @@ static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 	return ret;
 }
 
-static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
+static void virtqueue_disable_cb_packed(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
-
 	if (vq->packed.event_flags_shadow != VRING_PACKED_EVENT_FLAG_DISABLE) {
 		vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
 
@@ -2538,9 +2534,9 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	if (vq->packed_ring)
-		virtqueue_disable_cb_packed(_vq);
+		virtqueue_disable_cb_packed(vq);
 	else
-		virtqueue_disable_cb_split(_vq);
+		virtqueue_disable_cb_split(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 11/19] virtio_ring: switch to use vring_virtqueue for detach_unused_buf variants
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (9 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 10/19] virtio_ring: switch to use vring_virtqueue for disable_cb variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 12/19] virtio_ring: use u16 for last_used_idx in virtqueue_poll_split() Jason Wang
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Those variants are used internally so let's switch to use
vring_virtqueue as parameter to be consistent with other internal
virtqueue helpers.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4af29a6e15e8..03b17bc275dd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -950,9 +950,8 @@ static bool virtqueue_enable_cb_delayed_split(struct vring_virtqueue *vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_split(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
 	void *buf;
 
@@ -965,7 +964,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 		buf = vq->split.desc_state[i].data;
 		detach_buf_split(vq, i, NULL);
 		vq->split.avail_idx_shadow--;
-		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
+		vq->split.vring.avail->idx = cpu_to_virtio16(vq->vq.vdev,
 				vq->split.avail_idx_shadow);
 		END_USE(vq);
 		return buf;
@@ -1892,9 +1891,8 @@ static bool virtqueue_enable_cb_delayed_packed(struct vring_virtqueue *vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_packed(struct vring_virtqueue *vq)
 {
-	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
 	void *buf;
 
@@ -2642,8 +2640,8 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
-				 virtqueue_detach_unused_buf_split(_vq);
+	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(vq) :
+				 virtqueue_detach_unused_buf_split(vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 12/19] virtio_ring: use u16 for last_used_idx in virtqueue_poll_split()
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (10 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 11/19] virtio_ring: switch to use vring_virtqueue for detach_unused_buf variants Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 13/19] virtio_ring: introduce virtqueue ops Jason Wang
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Use u16 for last_used_idx in virtqueue_poll_split() to align with the
spec.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 03b17bc275dd..b14bbb4d6713 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -801,7 +801,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 }
 
 static bool virtqueue_poll_split(const struct vring_virtqueue *vq,
-				 unsigned int last_used_idx)
+				 u16 last_used_idx)
 {
 	return (u16)last_used_idx != virtio16_to_cpu(vq->vq.vdev,
 			vq->split.vring.used->idx);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 13/19] virtio_ring: introduce virtqueue ops
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (11 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 12/19] virtio_ring: use u16 for last_used_idx in virtqueue_poll_split() Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-07-01  6:28   ` Michael S. Tsirkin
  2025-06-16  8:25 ` [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time Jason Wang
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

This patch introduces virtqueue ops which is a set of the callbacks
that will be called for different queue layout or features. This would
help to avoid branches for split/packed and will ease the future
implementation like in order.

Note that in order to eliminate the indirect calls this patch uses
global array of const ops to allow compiler to avoid indirect
branches.

Tested with CONFIG_MITIGATION_RETPOLINE, no performance differences
were noticed.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 172 ++++++++++++++++++++++++++---------
 1 file changed, 129 insertions(+), 43 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b14bbb4d6713..af32d1a1a1db 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -67,6 +67,12 @@
 #define LAST_ADD_TIME_INVALID(vq)
 #endif
 
+enum vq_layout {
+	SPLIT = 0,
+	PACKED,
+	VQ_TYPE_MAX,
+};
+
 struct vring_desc_state_split {
 	void *data;			/* Data for callback. */
 
@@ -159,12 +165,28 @@ struct vring_virtqueue_packed {
 	size_t event_size_in_bytes;
 };
 
+struct vring_virtqueue;
+
+struct virtqueue_ops {
+	int (*add)(struct vring_virtqueue *_vq, struct scatterlist *sgs[],
+		   unsigned int total_sg, unsigned int out_sgs,
+		   unsigned int in_sgs,	void *data,
+		   void *ctx, bool premapped, gfp_t gfp);
+	void *(*get)(struct vring_virtqueue *vq, unsigned int *len, void **ctx);
+	bool (*kick_prepare)(struct vring_virtqueue *vq);
+	void (*disable_cb)(struct vring_virtqueue *vq);
+	bool (*enable_cb_delayed)(struct vring_virtqueue *vq);
+	unsigned int (*enable_cb_prepare)(struct vring_virtqueue *vq);
+	bool (*poll)(const struct vring_virtqueue *vq, u16 last_used_idx);
+	void *(*detach_unused_buf)(struct vring_virtqueue *vq);
+	bool (*more_used)(const struct vring_virtqueue *vq);
+	int (*resize)(struct vring_virtqueue *vq, u32 num);
+	void (*reset)(struct vring_virtqueue *vq);
+};
+
 struct vring_virtqueue {
 	struct virtqueue vq;
 
-	/* Is this a packed ring? */
-	bool packed_ring;
-
 	/* Is DMA API used? */
 	bool use_dma_api;
 
@@ -180,6 +202,8 @@ struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
+	enum vq_layout layout;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -232,6 +256,12 @@ static void vring_free(struct virtqueue *_vq);
 
 #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
 
+
+static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
+{
+	return vq->layout == PACKED;
+}
+
 static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
 				   unsigned int total_sg)
 {
@@ -422,7 +452,7 @@ static void virtqueue_init(struct vring_virtqueue *vq, u32 num)
 {
 	vq->vq.num_free = num;
 
-	if (vq->packed_ring)
+	if (virtqueue_is_packed(vq))
 		vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR);
 	else
 		vq->last_used_idx = 0;
@@ -1116,6 +1146,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
 	return 0;
 }
 
+static const struct virtqueue_ops split_ops;
+
 static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
 					       struct vring_virtqueue_split *vring_split,
 					       struct virtio_device *vdev,
@@ -1133,7 +1165,7 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
 	if (!vq)
 		return NULL;
 
-	vq->packed_ring = false;
+	vq->layout = SPLIT;
 	vq->vq.callback = callback;
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
@@ -2076,6 +2108,8 @@ static void virtqueue_reset_packed(struct vring_virtqueue *vq)
 	virtqueue_vring_init_packed(&vq->packed, !!vq->vq.callback);
 }
 
+static const struct virtqueue_ops packed_ops;
+
 static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
 					       struct vring_virtqueue_packed *vring_packed,
 					       struct virtio_device *vdev,
@@ -2106,7 +2140,7 @@ static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
 #else
 	vq->broken = false;
 #endif
-	vq->packed_ring = true;
+	vq->layout = PACKED;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 
@@ -2194,6 +2228,39 @@ static int virtqueue_resize_packed(struct vring_virtqueue *vq, u32 num)
 	return -ENOMEM;
 }
 
+static const struct virtqueue_ops split_ops = {
+	.add = virtqueue_add_split,
+	.get = virtqueue_get_buf_ctx_split,
+	.kick_prepare = virtqueue_kick_prepare_split,
+	.disable_cb = virtqueue_disable_cb_split,
+	.enable_cb_delayed = virtqueue_enable_cb_delayed_split,
+	.enable_cb_prepare = virtqueue_enable_cb_prepare_split,
+	.poll = virtqueue_poll_split,
+	.detach_unused_buf = virtqueue_detach_unused_buf_split,
+	.more_used = more_used_split,
+	.resize = virtqueue_resize_split,
+	.reset = virtqueue_reset_split,
+};
+
+static const struct virtqueue_ops packed_ops = {
+	.add = virtqueue_add_packed,
+	.get = virtqueue_get_buf_ctx_packed,
+	.kick_prepare = virtqueue_kick_prepare_packed,
+	.disable_cb = virtqueue_disable_cb_packed,
+	.enable_cb_delayed = virtqueue_enable_cb_delayed_packed,
+	.enable_cb_prepare = virtqueue_enable_cb_prepare_packed,
+	.poll = virtqueue_poll_packed,
+	.detach_unused_buf = virtqueue_detach_unused_buf_packed,
+	.more_used = more_used_packed,
+	.resize = virtqueue_resize_packed,
+	.reset = virtqueue_reset_packed,
+};
+
+static const struct virtqueue_ops *const all_ops[VQ_TYPE_MAX] = {
+	[SPLIT] = &split_ops,
+	[PACKED] = &packed_ops
+};
+
 static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
 					 void (*recycle)(struct virtqueue *vq, void *buf))
 {
@@ -2236,6 +2303,38 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
  * Generic functions and exported symbols.
  */
 
+#define VIRTQUEUE_CALL(vq, op, ...)					\
+	({								\
+	typeof(all_ops[SPLIT]->op(vq, ##__VA_ARGS__)) ret;		\
+	switch (vq->layout) {						\
+	case SPLIT:							\
+		ret = all_ops[SPLIT]->op(vq, ##__VA_ARGS__);		\
+		break;							\
+	case PACKED:							\
+		ret = all_ops[PACKED]->op(vq, ##__VA_ARGS__);		\
+		break;							\
+	default:							\
+		BUG();							\
+		break;							\
+	}								\
+	ret;								\
+})
+
+#define VOID_VIRTQUEUE_CALL(vq, op, ...)		  \
+	({						  \
+        switch ((vq)->layout) {				  \
+        case SPLIT:					  \
+                all_ops[SPLIT]->op(vq, ##__VA_ARGS__);    \
+                break;					  \
+        case PACKED:					  \
+                all_ops[PACKED]->op(vq, ##__VA_ARGS__);   \
+                break;					  \
+        default:					  \
+                BUG();					  \
+                break;					  \
+        }						  \
+})
+
 static inline int virtqueue_add(struct virtqueue *_vq,
 				struct scatterlist *sgs[],
 				unsigned int total_sg,
@@ -2248,10 +2347,9 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_add_packed(vq, sgs, total_sg,
-					out_sgs, in_sgs, data, ctx, premapped, gfp) :
-				 virtqueue_add_split(vq, sgs, total_sg,
-					out_sgs, in_sgs, data, ctx, premapped, gfp);
+	return VIRTQUEUE_CALL(vq, add, sgs, total_sg,
+			      out_sgs, in_sgs, data,
+			      ctx, premapped, gfp);
 }
 
 /**
@@ -2437,8 +2535,7 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_kick_prepare_packed(vq) :
-				 virtqueue_kick_prepare_split(vq);
+	return VIRTQUEUE_CALL(vq, kick_prepare);
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
 
@@ -2508,8 +2605,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(vq, len, ctx) :
-				 virtqueue_get_buf_ctx_split(vq, len, ctx);
+	return VIRTQUEUE_CALL(vq, get, len, ctx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
@@ -2531,10 +2627,7 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	if (vq->packed_ring)
-		virtqueue_disable_cb_packed(vq);
-	else
-		virtqueue_disable_cb_split(vq);
+	VOID_VIRTQUEUE_CALL(vq, disable_cb);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
@@ -2557,8 +2650,7 @@ unsigned int virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 	if (vq->event_triggered)
 		vq->event_triggered = false;
 
-	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(vq) :
-				 virtqueue_enable_cb_prepare_split(vq);
+	return VIRTQUEUE_CALL(vq, enable_cb_prepare);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
@@ -2579,8 +2671,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned int last_used_idx)
 		return false;
 
 	virtio_mb(vq->weak_barriers);
-	return vq->packed_ring ? virtqueue_poll_packed(vq, last_used_idx) :
-				 virtqueue_poll_split(vq, last_used_idx);
+
+	return VIRTQUEUE_CALL(vq, poll, last_used_idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_poll);
 
@@ -2623,8 +2715,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 	if (vq->event_triggered)
 		data_race(vq->event_triggered = false);
 
-	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(vq) :
-				 virtqueue_enable_cb_delayed_split(vq);
+	return VIRTQUEUE_CALL(vq, enable_cb_delayed);
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
 
@@ -2640,14 +2731,13 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(vq) :
-				 virtqueue_detach_unused_buf_split(vq);
+	return VIRTQUEUE_CALL(vq, detach_unused_buf);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
-	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
+	return VIRTQUEUE_CALL(vq, more_used);
 }
 
 /**
@@ -2776,7 +2866,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 	if (!num)
 		return -EINVAL;
 
-	if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num)
+	if ((virtqueue_is_packed(vq) ? vq->packed.vring.num :
+			               vq->split.vring.num) == num)
 		return 0;
 
 	err = virtqueue_disable_and_recycle(_vq, recycle);
@@ -2785,10 +2876,7 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 	if (recycle_done)
 		recycle_done(_vq);
 
-	if (vq->packed_ring)
-		err = virtqueue_resize_packed(vq, num);
-	else
-		err = virtqueue_resize_split(vq, num);
+	err = VIRTQUEUE_CALL(vq, resize, num);
 
 	return virtqueue_enable_after_reset(_vq);
 }
@@ -2822,10 +2910,7 @@ int virtqueue_reset(struct virtqueue *_vq,
 	if (recycle_done)
 		recycle_done(_vq);
 
-	if (vq->packed_ring)
-		virtqueue_reset_packed(vq);
-	else
-		virtqueue_reset_split(vq);
+	VOID_VIRTQUEUE_CALL(vq, reset);
 
 	return virtqueue_enable_after_reset(_vq);
 }
@@ -2867,7 +2952,7 @@ static void vring_free(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	if (vq->we_own_ring) {
-		if (vq->packed_ring) {
+		if (virtqueue_is_packed(vq)) {
 			vring_free_queue(vq->vq.vdev,
 					 vq->packed.ring_size_in_bytes,
 					 vq->packed.vring.desc,
@@ -2896,7 +2981,7 @@ static void vring_free(struct virtqueue *_vq)
 					 vring_dma_dev(vq));
 		}
 	}
-	if (!vq->packed_ring) {
+	if (!virtqueue_is_packed(vq)) {
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
 	}
@@ -2921,7 +3006,7 @@ u32 vring_notification_data(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 next;
 
-	if (vq->packed_ring)
+	if (virtqueue_is_packed(vq))
 		next = (vq->packed.next_avail_idx &
 				~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))) |
 			vq->packed.avail_wrap_counter <<
@@ -2974,7 +3059,8 @@ unsigned int virtqueue_get_vring_size(const struct virtqueue *_vq)
 
 	const struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
+	return virtqueue_is_packed(vq) ? vq->packed.vring.num :
+				      vq->split.vring.num;
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
@@ -3057,7 +3143,7 @@ dma_addr_t virtqueue_get_desc_addr(const struct virtqueue *_vq)
 
 	BUG_ON(!vq->we_own_ring);
 
-	if (vq->packed_ring)
+	if (virtqueue_is_packed(vq))
 		return vq->packed.ring_dma_addr;
 
 	return vq->split.queue_dma_addr;
@@ -3070,7 +3156,7 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue *_vq)
 
 	BUG_ON(!vq->we_own_ring);
 
-	if (vq->packed_ring)
+	if (virtqueue_is_packed(vq))
 		return vq->packed.driver_event_dma_addr;
 
 	return vq->split.queue_dma_addr +
@@ -3084,7 +3170,7 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *_vq)
 
 	BUG_ON(!vq->we_own_ring);
 
-	if (vq->packed_ring)
+	if (virtqueue_is_packed(vq))
 		return vq->packed.device_event_dma_addr;
 
 	return vq->split.queue_dma_addr +
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (12 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 13/19] virtio_ring: introduce virtqueue ops Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-07-01  6:42   ` Michael S. Tsirkin
  2025-06-16  8:25 ` [PATCH V3 15/19] virtio_ring: factor out core logic of buffer detaching Jason Wang
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Let's determine the last descriptor by counting the number of sg. This
would be consistent with packed virtqueue implementation and ease the
future in-order implementation.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index af32d1a1a1db..d5e4d4cd2487 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -570,7 +570,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 	struct vring_desc_extra *extra;
 	struct scatterlist *sg;
 	struct vring_desc *desc;
-	unsigned int i, n, avail, descs_used, prev, err_idx;
+	unsigned int i, n, c, avail, descs_used, err_idx;
 	int head;
 	bool indirect;
 
@@ -626,46 +626,47 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 		return -ENOSPC;
 	}
 
+	c = 0;
 	for (n = 0; n < out_sgs; n++) {
+		sg = sgs[n];
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 			u32 len;
+			u16 flags = 0;
 
 			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr, &len, premapped))
 				goto unmap_release;
 
-			prev = i;
+			if (++c != total_sg)
+				flags = VRING_DESC_F_NEXT;
+
 			/* Note that we trust indirect descriptor
 			 * table since it use stream DMA mapping.
 			 */
 			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
-						     VRING_DESC_F_NEXT,
+						     flags,
 						     premapped);
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			u16 flags = VRING_DESC_F_WRITE;
 			dma_addr_t addr;
 			u32 len;
 
 			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr, &len, premapped))
 				goto unmap_release;
 
-			prev = i;
+			if (++c != total_sg)
+				flags |= VRING_DESC_F_NEXT;
+
 			/* Note that we trust indirect descriptor
 			 * table since it use stream DMA mapping.
 			 */
 			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
-						     VRING_DESC_F_NEXT |
-						     VRING_DESC_F_WRITE,
-						     premapped);
+						     flags, premapped);
 		}
 	}
-	/* Last one doesn't continue. */
-	desc[prev].flags &= cpu_to_virtio16(vq->vq.vdev, ~VRING_DESC_F_NEXT);
-	if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
-		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
-			~VRING_DESC_F_NEXT;
 
 	if (indirect) {
 		/* Now that the indirect table is filled in, map it. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 15/19] virtio_ring: factor out core logic of buffer detaching
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (13 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 16/19] virtio_ring: factor out core logic for updating last_used_idx Jason Wang
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Factor out core logic of buffer detaching and leave the id population
to the caller so in order can just call the core logic.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d5e4d4cd2487..1bdd332d515e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1669,8 +1669,8 @@ static bool virtqueue_kick_prepare_packed(struct vring_virtqueue *vq)
 	return needs_kick;
 }
 
-static void detach_buf_packed(struct vring_virtqueue *vq,
-			      unsigned int id, void **ctx)
+static void detach_buf_packed_in_order(struct vring_virtqueue *vq,
+				       unsigned int id, void **ctx)
 {
 	struct vring_desc_state_packed *state = NULL;
 	struct vring_packed_desc *desc;
@@ -1681,8 +1681,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	/* Clear data ptr. */
 	state->data = NULL;
 
-	vq->packed.desc_extra[state->last].next = vq->free_head;
-	vq->free_head = id;
 	vq->vq.num_free += state->num;
 
 	if (unlikely(vq->use_dma_api)) {
@@ -1719,6 +1717,17 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	}
 }
 
+static void detach_buf_packed(struct vring_virtqueue *vq,
+			      unsigned int id, void **ctx)
+{
+	struct vring_desc_state_packed *state = &vq->packed.desc_state[id];
+
+	vq->packed.desc_extra[state->last].next = vq->free_head;
+	vq->free_head = id;
+
+	return detach_buf_packed_in_order(vq, id, ctx);
+}
+
 static inline bool is_used_desc_packed(const struct vring_virtqueue *vq,
 				       u16 idx, bool used_wrap_counter)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 16/19] virtio_ring: factor out core logic for updating last_used_idx
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (14 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 15/19] virtio_ring: factor out core logic of buffer detaching Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-06-16  8:25 ` [PATCH V3 17/19] virtio_ring: factor out split indirect detaching logic Jason Wang
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Factor out the core logic for updating last_used_idx to be reused by
the packed in order implementation.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 43 +++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1bdd332d515e..e8e0d1204f52 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1757,6 +1757,30 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
 	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
 }
 
+static void update_last_used_idx_packed(struct vring_virtqueue *vq,
+					u16 id, u16 last_used,
+					u16 used_wrap_counter)
+{
+	last_used += vq->packed.desc_state[id].num;
+	if (unlikely(last_used >= vq->packed.vring.num)) {
+		last_used -= vq->packed.vring.num;
+		used_wrap_counter ^= 1;
+	}
+
+	last_used = (last_used | (used_wrap_counter << VRING_PACKED_EVENT_F_WRAP_CTR));
+	WRITE_ONCE(vq->last_used_idx, last_used);
+
+	/*
+	 * If we expect an interrupt for the next entry, tell host
+	 * by writing event index and flush out the write before
+	 * the read in the next get_buf call.
+	 */
+	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC)
+		virtio_store_mb(vq->weak_barriers,
+				&vq->packed.vring.driver->off_wrap,
+				cpu_to_le16(vq->last_used_idx));
+}
+
 static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 					  unsigned int *len,
 					  void **ctx)
@@ -1800,24 +1824,7 @@ static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 	ret = vq->packed.desc_state[id].data;
 	detach_buf_packed(vq, id, ctx);
 
-	last_used += vq->packed.desc_state[id].num;
-	if (unlikely(last_used >= vq->packed.vring.num)) {
-		last_used -= vq->packed.vring.num;
-		used_wrap_counter ^= 1;
-	}
-
-	last_used = (last_used | (used_wrap_counter << VRING_PACKED_EVENT_F_WRAP_CTR));
-	WRITE_ONCE(vq->last_used_idx, last_used);
-
-	/*
-	 * If we expect an interrupt for the next entry, tell host
-	 * by writing event index and flush out the write before
-	 * the read in the next get_buf call.
-	 */
-	if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC)
-		virtio_store_mb(vq->weak_barriers,
-				&vq->packed.vring.driver->off_wrap,
-				cpu_to_le16(vq->last_used_idx));
+	update_last_used_idx_packed(vq, id, last_used, used_wrap_counter);
 
 	LAST_ADD_TIME_INVALID(vq);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 17/19] virtio_ring: factor out split indirect detaching logic
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (15 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 16/19] virtio_ring: factor out core logic for updating last_used_idx Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-07-01  6:44   ` Michael S. Tsirkin
  2025-06-16  8:25 ` [PATCH V3 18/19] virtio_ring: factor out split " Jason Wang
                   ` (3 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

Factor out the split indirect descriptor detaching logic in order to
make it be reused by the in order support.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e8e0d1204f52..259380797ec4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -773,11 +773,42 @@ static bool virtqueue_kick_prepare_split(struct vring_virtqueue *vq)
 	return needs_kick;
 }
 
+static void detach_indirect_split(struct vring_virtqueue *vq,
+				  unsigned int head)
+{
+	struct vring_desc_extra *extra = vq->split.desc_extra;
+	struct vring_desc *indir_desc =
+	       vq->split.desc_state[head].indir_desc;
+	unsigned int j;
+	u32 len, num;
+
+	/* Free the indirect table, if any, now that it's unmapped. */
+	if (!indir_desc)
+		return;
+	len = vq->split.desc_extra[head].len;
+
+	BUG_ON(!(vq->split.desc_extra[head].flags &
+			VRING_DESC_F_INDIRECT));
+	BUG_ON(len == 0 || len % sizeof(struct vring_desc));
+
+	num = len / sizeof(struct vring_desc);
+
+	extra = (struct vring_desc_extra *)&indir_desc[num];
+
+	if (vq->use_dma_api) {
+		for (j = 0; j < num; j++)
+			vring_unmap_one_split(vq, &extra[j]);
+	}
+
+	kfree(indir_desc);
+	vq->split.desc_state[head].indir_desc = NULL;
+}
+
 static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 			     void **ctx)
 {
 	struct vring_desc_extra *extra;
-	unsigned int i, j;
+	unsigned int i;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
 
 	/* Clear data ptr. */
@@ -801,34 +832,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	/* Plus final descriptor */
 	vq->vq.num_free++;
 
-	if (vq->indirect) {
-		struct vring_desc *indir_desc =
-				vq->split.desc_state[head].indir_desc;
-		u32 len, num;
-
-		/* Free the indirect table, if any, now that it's unmapped. */
-		if (!indir_desc)
-			return;
-		len = vq->split.desc_extra[head].len;
-
-		BUG_ON(!(vq->split.desc_extra[head].flags &
-				VRING_DESC_F_INDIRECT));
-		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
-
-		num = len / sizeof(struct vring_desc);
-
-		extra = (struct vring_desc_extra *)&indir_desc[num];
-
-		if (vq->use_dma_api) {
-			for (j = 0; j < num; j++)
-				vring_unmap_one_split(vq, &extra[j]);
-		}
-
-		kfree(indir_desc);
-		vq->split.desc_state[head].indir_desc = NULL;
-	} else if (ctx) {
+	if (vq->indirect)
+		detach_indirect_split(vq, head);
+	else if (ctx)
 		*ctx = vq->split.desc_state[head].indir_desc;
-	}
 }
 
 static bool virtqueue_poll_split(const struct vring_virtqueue *vq,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 18/19] virtio_ring: factor out split detaching logic
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (16 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 17/19] virtio_ring: factor out split indirect detaching logic Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-07-01  6:45   ` Michael S. Tsirkin
  2025-06-16  8:25 ` [PATCH V3 19/19] virtio_ring: add in order support Jason Wang
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

This patch factors out the split core detaching logic that could be
reused by in order feature into a dedicated function.

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 259380797ec4..27a9459a0555 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -804,8 +804,9 @@ static void detach_indirect_split(struct vring_virtqueue *vq,
 	vq->split.desc_state[head].indir_desc = NULL;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+static unsigned detach_buf_split_in_order(struct vring_virtqueue *vq,
+					  unsigned int head,
+					  void **ctx)
 {
 	struct vring_desc_extra *extra;
 	unsigned int i;
@@ -826,8 +827,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	}
 
 	vring_unmap_one_split(vq, &extra[i]);
-	vq->split.desc_extra[i].next = vq->free_head;
-	vq->free_head = head;
 
 	/* Plus final descriptor */
 	vq->vq.num_free++;
@@ -836,6 +835,17 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 		detach_indirect_split(vq, head);
 	else if (ctx)
 		*ctx = vq->split.desc_state[head].indir_desc;
+
+	return i;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			     void **ctx)
+{
+	unsigned int i = detach_buf_split_in_order(vq, head, ctx);
+
+	vq->split.desc_extra[i].next = vq->free_head;
+	vq->free_head = head;
 }
 
 static bool virtqueue_poll_split(const struct vring_virtqueue *vq,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH V3 19/19] virtio_ring: add in order support
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (17 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 18/19] virtio_ring: factor out split " Jason Wang
@ 2025-06-16  8:25 ` Jason Wang
  2025-07-01  6:56   ` Michael S. Tsirkin
  2025-06-17 12:29 ` [PATCH V3 00/19] virtio_ring " Eugenio Perez Martin
  2025-07-01  6:58 ` Michael S. Tsirkin
  20 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-06-16  8:25 UTC (permalink / raw)
  To: mst, jasowang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

This patch implements in order support for both split virtqueue and
packed virtqueue.

Benchmark with KVM guest + testpmd on the host shows:

For split virtqueue: no obvious differences were noticed

For packed virtqueue:

1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
 1 file changed, 402 insertions(+), 21 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 27a9459a0555..21d456392ba0 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -70,11 +70,14 @@
 enum vq_layout {
 	SPLIT = 0,
 	PACKED,
+	SPLIT_IN_ORDER,
+	PACKED_IN_ORDER,
 	VQ_TYPE_MAX,
 };
 
 struct vring_desc_state_split {
 	void *data;			/* Data for callback. */
+	u32 total_len;			/* Buffer Length */
 
 	/* Indirect desc table and extra table, if any. These two will be
 	 * allocated together. So we won't stress more to the memory allocator.
@@ -84,6 +87,7 @@ struct vring_desc_state_split {
 
 struct vring_desc_state_packed {
 	void *data;			/* Data for callback. */
+	u32 total_len;			/* Buffer Length */
 
 	/* Indirect desc table and extra table, if any. These two will be
 	 * allocated together. So we won't stress more to the memory allocator.
@@ -206,6 +210,12 @@ struct vring_virtqueue {
 
 	/* Head of free buffer list. */
 	unsigned int free_head;
+
+	/* Head of the batched used buffers, vq->num means no batching */
+	unsigned int batch_head;
+
+	unsigned int batch_len;
+
 	/* Number we've added since last sync. */
 	unsigned int num_added;
 
@@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
 
 #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
 
-
 static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
 {
-	return vq->layout == PACKED;
+	return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
+}
+
+static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
+{
+	return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
 }
 
 static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
@@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 	struct vring_desc_extra *extra;
 	struct scatterlist *sg;
 	struct vring_desc *desc;
-	unsigned int i, n, c, avail, descs_used, err_idx;
+	unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;
 	int head;
 	bool indirect;
 
@@ -646,6 +660,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
 						     flags,
 						     premapped);
+			total_len += len;
 		}
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
@@ -665,6 +680,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 			 */
 			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
 						     flags, premapped);
+			total_len += len;
 		}
 	}
 
@@ -687,7 +703,12 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
 	vq->vq.num_free -= descs_used;
 
 	/* Update free pointer */
-	if (indirect)
+	if (virtqueue_is_in_order(vq)) {
+		vq->free_head += descs_used;
+		if (vq->free_head >= vq->split.vring.num)
+			vq->free_head -= vq->split.vring.num;
+		vq->split.desc_state[head].total_len = total_len;;
+	} else if (indirect)
 		vq->free_head = vq->split.desc_extra[head].next;
 	else
 		vq->free_head = i;
@@ -860,6 +881,14 @@ static bool more_used_split(const struct vring_virtqueue *vq)
 	return virtqueue_poll_split(vq, vq->last_used_idx);
 }
 
+static bool more_used_split_in_order(const struct vring_virtqueue *vq)
+{
+	if (vq->batch_head != vq->packed.vring.num)
+		return true;
+
+	return virtqueue_poll_split(vq, vq->last_used_idx);
+}
+
 static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq,
 					 unsigned int *len,
 					 void **ctx)
@@ -917,6 +946,73 @@ static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq,
 	return ret;
 }
 
+static void *virtqueue_get_buf_ctx_split_in_order(struct vring_virtqueue *vq,
+						  unsigned int *len,
+						  void **ctx)
+{
+	void *ret;
+	unsigned int num = vq->split.vring.num;
+	u16 last_used;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return NULL;
+	}
+
+	last_used = (vq->last_used_idx & (vq->split.vring.num - 1));
+
+	if (vq->batch_head == num) {
+		if (!more_used_split(vq)) {
+			pr_debug("No more buffers in queue\n");
+			END_USE(vq);
+			return NULL;
+		}
+
+		/* Only get used array entries after they have been
+		 * exposed by host. */
+		virtio_rmb(vq->weak_barriers);
+		vq->batch_head = virtio32_to_cpu(vq->vq.vdev,
+				 vq->split.vring.used->ring[last_used].id);
+		vq->batch_len = virtio32_to_cpu(vq->vq.vdev,
+				vq->split.vring.used->ring[last_used].len);
+	}
+
+	if (vq->batch_head == last_used) {
+		vq->batch_head = num;
+		*len = vq->batch_len;
+	} else
+		*len = vq->split.desc_state[last_used].total_len;
+
+	if (unlikely(last_used >= num)) {
+		BAD_RING(vq, "id %u out of range\n", last_used);
+		return NULL;
+	}
+	if (unlikely(!vq->split.desc_state[last_used].data)) {
+		BAD_RING(vq, "id %u is not a head!\n", last_used);
+		return NULL;
+	}
+
+	/* detach_buf_split clears data, so grab it now. */
+	ret = vq->split.desc_state[last_used].data;
+	detach_buf_split_in_order(vq, last_used, ctx);
+
+	vq->last_used_idx++;
+	/* If we expect an interrupt for the next entry, tell host
+	 * by writing event index and flush out the write before
+	 * the read in the next get_buf call. */
+	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
+		virtio_store_mb(vq->weak_barriers,
+				&vring_used_event(&vq->split.vring),
+				cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx));
+
+	LAST_ADD_TIME_INVALID(vq);
+
+	END_USE(vq);
+	return ret;
+}
+
 static void virtqueue_disable_cb_split(struct vring_virtqueue *vq)
 {
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
@@ -1010,7 +1106,10 @@ static void *virtqueue_detach_unused_buf_split(struct vring_virtqueue *vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
-		detach_buf_split(vq, i, NULL);
+		if (virtqueue_is_in_order(vq))
+			detach_buf_split_in_order(vq, i, NULL);
+		else
+			detach_buf_split(vq, i, NULL);
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(vq->vq.vdev,
 				vq->split.avail_idx_shadow);
@@ -1073,6 +1172,7 @@ static void virtqueue_vring_attach_split(struct vring_virtqueue *vq,
 
 	/* Put everything in free lists. */
 	vq->free_head = 0;
+	vq->batch_head = vq->split.vring.num;
 }
 
 static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring_split)
@@ -1183,7 +1283,6 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
 	if (!vq)
 		return NULL;
 
-	vq->layout = SPLIT;
 	vq->vq.callback = callback;
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
@@ -1203,6 +1302,8 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+	vq->layout = virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ?
+		     SPLIT_IN_ORDER : SPLIT;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
 		vq->weak_barriers = false;
@@ -1366,13 +1467,14 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 					 unsigned int in_sgs,
 					 void *data,
 					 bool premapped,
-					 gfp_t gfp)
+					 gfp_t gfp,
+					 u16 id)
 {
 	struct vring_desc_extra *extra;
 	struct vring_packed_desc *desc;
 	struct scatterlist *sg;
-	unsigned int i, n, err_idx, len;
-	u16 head, id;
+	unsigned int i, n, err_idx, len, total_len = 0;
+	u16 head;
 	dma_addr_t addr;
 
 	head = vq->packed.next_avail_idx;
@@ -1390,8 +1492,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	}
 
 	i = 0;
-	id = vq->free_head;
-	BUG_ON(id == vq->packed.vring.num);
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
@@ -1411,6 +1511,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 				extra[i].flags = n < out_sgs ?  0 : VRING_DESC_F_WRITE;
 			}
 
+			total_len += len;
 			i++;
 		}
 	}
@@ -1464,6 +1565,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	vq->packed.desc_state[id].data = data;
 	vq->packed.desc_state[id].indir_desc = desc;
 	vq->packed.desc_state[id].last = id;
+	vq->packed.desc_state[id].total_len = total_len;
 
 	vq->num_added += 1;
 
@@ -1516,8 +1618,11 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
 	BUG_ON(total_sg == 0);
 
 	if (virtqueue_use_indirect(vq, total_sg)) {
+		id = vq->free_head;
+		BUG_ON(id == vq->packed.vring.num);
 		err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
-						    in_sgs, data, premapped, gfp);
+						    in_sgs, data, premapped,
+						    gfp, id);
 		if (err != -ENOMEM) {
 			END_USE(vq);
 			return err;
@@ -1638,6 +1743,152 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
 	return -EIO;
 }
 
+static inline int virtqueue_add_packed_in_order(struct vring_virtqueue *vq,
+						struct scatterlist *sgs[],
+						unsigned int total_sg,
+						unsigned int out_sgs,
+						unsigned int in_sgs,
+						void *data,
+						void *ctx,
+						bool premapped,
+						gfp_t gfp)
+{
+	struct vring_packed_desc *desc;
+	struct scatterlist *sg;
+	unsigned int i, n, c, err_idx, total_len = 0;
+	__le16 head_flags, flags;
+	u16 head, avail_used_flags;
+	int err;
+
+	START_USE(vq);
+
+	BUG_ON(data == NULL);
+	BUG_ON(ctx && vq->indirect);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return -EIO;
+	}
+
+	LAST_ADD_TIME_UPDATE(vq);
+
+	BUG_ON(total_sg == 0);
+
+	if (virtqueue_use_indirect(vq, total_sg)) {
+		err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
+						    in_sgs, data, premapped, gfp,
+						    vq->packed.next_avail_idx);
+		if (err != -ENOMEM) {
+			END_USE(vq);
+			return err;
+		}
+
+		/* fall back on direct */
+	}
+
+	head = vq->packed.next_avail_idx;
+	avail_used_flags = vq->packed.avail_used_flags;
+
+	WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect);
+
+	desc = vq->packed.vring.desc;
+	i = head;
+
+	if (unlikely(vq->vq.num_free < total_sg)) {
+		pr_debug("Can't add buf len %i - avail = %i\n",
+			 total_sg, vq->vq.num_free);
+		END_USE(vq);
+		return -ENOSPC;
+	}
+
+	c = 0;
+	for (n = 0; n < out_sgs + in_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr;
+			u32 len;
+
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE,
+					     &addr, &len, premapped))
+				goto unmap_release;
+
+			flags = cpu_to_le16(vq->packed.avail_used_flags |
+				    (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
+				    (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
+			if (i == head)
+				head_flags = flags;
+			else
+				desc[i].flags = flags;
+
+
+			desc[i].addr = cpu_to_le64(addr);
+			desc[i].len = cpu_to_le32(len);
+			desc[i].id = cpu_to_le16(head);
+
+			if (unlikely(vq->use_dma_api)) {
+				vq->packed.desc_extra[i].addr = premapped ?
+				      DMA_MAPPING_ERROR: addr;
+				vq->packed.desc_extra[i].len = len;
+				vq->packed.desc_extra[i].flags =
+					le16_to_cpu(flags);
+			}
+
+			if ((unlikely(++i >= vq->packed.vring.num))) {
+				i = 0;
+				vq->packed.avail_used_flags ^=
+					1 << VRING_PACKED_DESC_F_AVAIL |
+					1 << VRING_PACKED_DESC_F_USED;
+				vq->packed.avail_wrap_counter ^= 1;
+			}
+
+			total_len += len;
+		}
+	}
+
+	/* We're using some buffers from the free list. */
+	vq->vq.num_free -= total_sg;
+
+	/* Update free pointer */
+	vq->packed.next_avail_idx = i;
+
+	/* Store token. */
+	vq->packed.desc_state[head].num = total_sg;
+	vq->packed.desc_state[head].data = data;
+	vq->packed.desc_state[head].indir_desc = ctx;
+	vq->packed.desc_state[head].total_len = total_len;
+
+	/*
+	 * A driver MUST NOT make the first descriptor in the list
+	 * available before all subsequent descriptors comprising
+	 * the list are made available.
+	 */
+	virtio_wmb(vq->weak_barriers);
+	vq->packed.vring.desc[head].flags = head_flags;
+	vq->num_added += total_sg;
+
+	pr_debug("Added buffer head %i to %p\n", head, vq);
+	END_USE(vq);
+
+	return 0;
+
+unmap_release:
+	err_idx = i;
+	i = head;
+	vq->packed.avail_used_flags = avail_used_flags;
+
+	for (n = 0; n < total_sg; n++) {
+		if (i == err_idx)
+			break;
+		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[i]);
+		i++;
+		if (i >= vq->packed.vring.num)
+			i = 0;
+	}
+
+	END_USE(vq);
+	return -EIO;
+}
+
 static bool virtqueue_kick_prepare_packed(struct vring_virtqueue *vq)
 {
 	u16 new, old, off_wrap, flags, wrap_counter, event_idx;
@@ -1758,7 +2009,7 @@ static inline bool is_used_desc_packed(const struct vring_virtqueue *vq,
 	return avail == used && used == used_wrap_counter;
 }
 
-static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
+static bool __virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
 {
 	bool wrap_counter;
 	u16 used_idx;
@@ -1769,6 +2020,11 @@ static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap
 	return is_used_desc_packed(vq, used_idx, wrap_counter);
 }
 
+static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
+{
+	return __virtqueue_poll_packed(vq, off_wrap);
+}
+
 static bool more_used_packed(const struct vring_virtqueue *vq)
 {
 	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
@@ -1798,10 +2054,84 @@ static void update_last_used_idx_packed(struct vring_virtqueue *vq,
 				cpu_to_le16(vq->last_used_idx));
 }
 
+static bool more_used_packed_in_order(const struct vring_virtqueue *vq)
+{
+	if (vq->batch_head != vq->packed.vring.num)
+		return true;
+
+	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
+}
+
+static bool __more_used_packed(const struct vring_virtqueue *vq)
+{
+	return __virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
+}
+
+static void *virtqueue_get_buf_ctx_packed_in_order(struct vring_virtqueue *vq,
+						   unsigned int *len,
+						   void **ctx)
+{
+	unsigned int num = vq->packed.vring.num;
+	u16 last_used, id, last_used_idx;
+	bool used_wrap_counter;
+	void *ret;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return NULL;
+	}
+
+	last_used_idx = vq->last_used_idx;
+	used_wrap_counter = packed_used_wrap_counter(last_used_idx);
+	last_used = packed_last_used(last_used_idx);
+
+	if (vq->batch_head == num) {
+		if (!__more_used_packed(vq)) {
+			pr_debug("No more buffers in queue\n");
+			END_USE(vq);
+			return NULL;
+		}
+		/* Only get used elements after they have been exposed by host. */
+		virtio_rmb(vq->weak_barriers);
+		vq->batch_head = le16_to_cpu(vq->packed.vring.desc[last_used].id);
+		vq->batch_len = le32_to_cpu(vq->packed.vring.desc[last_used].len);
+	}
+
+	if (vq->batch_head == last_used) {
+		vq->batch_head = num;
+		*len = vq->batch_len;
+	} else
+		*len = vq->packed.desc_state[last_used].total_len;
+
+	if (unlikely(last_used >= num)) {
+		BAD_RING(vq, "id %u out of range\n", id);
+		return NULL;
+	}
+	if (unlikely(!vq->packed.desc_state[last_used].data)) {
+		BAD_RING(vq, "id %u is not a head!\n", id);
+		return NULL;
+	}
+
+	/* detach_buf_packed clears data, so grab it now. */
+	ret = vq->packed.desc_state[last_used].data;
+	detach_buf_packed_in_order(vq, last_used, ctx);
+
+	update_last_used_idx_packed(vq, last_used, last_used,
+				    used_wrap_counter);
+
+	LAST_ADD_TIME_INVALID(vq);
+
+	END_USE(vq);
+	return ret;
+}
+
 static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 					  unsigned int *len,
 					  void **ctx)
 {
+	unsigned int num = vq->packed.vring.num;
 	u16 last_used, id, last_used_idx;
 	bool used_wrap_counter;
 	void *ret;
@@ -1813,7 +2143,7 @@ static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 		return NULL;
 	}
 
-	if (!more_used_packed(vq)) {
+	if (!__more_used_packed(vq)) {
 		pr_debug("No more buffers in queue\n");
 		END_USE(vq);
 		return NULL;
@@ -1828,7 +2158,7 @@ static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
 	id = le16_to_cpu(vq->packed.vring.desc[last_used].id);
 	*len = le32_to_cpu(vq->packed.vring.desc[last_used].len);
 
-	if (unlikely(id >= vq->packed.vring.num)) {
+	if (unlikely(id >= num)) {
 		BAD_RING(vq, "id %u out of range\n", id);
 		return NULL;
 	}
@@ -1948,6 +2278,7 @@ static bool virtqueue_enable_cb_delayed_packed(struct vring_virtqueue *vq)
 	last_used_idx = READ_ONCE(vq->last_used_idx);
 	wrap_counter = packed_used_wrap_counter(last_used_idx);
 	used_idx = packed_last_used(last_used_idx);
+
 	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
 		END_USE(vq);
 		return false;
@@ -1969,7 +2300,7 @@ static void *virtqueue_detach_unused_buf_packed(struct vring_virtqueue *vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->packed.desc_state[i].data;
-		detach_buf_packed(vq, i, NULL);
+		detach_buf_packed_in_order(vq, i, NULL);
 		END_USE(vq);
 		return buf;
 	}
@@ -1995,6 +2326,8 @@ static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num)
 	for (i = 0; i < num - 1; i++)
 		desc_extra[i].next = i + 1;
 
+	desc_extra[num - 1].next = 0;
+
 	return desc_extra;
 }
 
@@ -2126,8 +2459,12 @@ static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq,
 {
 	vq->packed = *vring_packed;
 
-	/* Put everything in free lists. */
-	vq->free_head = 0;
+	if (virtqueue_is_in_order(vq))
+		vq->batch_head = vq->split.vring.num;
+	else {
+		/* Put everything in free lists. */
+		vq->free_head = 0;
+	}
 }
 
 static void virtqueue_reset_packed(struct vring_virtqueue *vq)
@@ -2174,13 +2511,14 @@ static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
 #else
 	vq->broken = false;
 #endif
-	vq->layout = PACKED;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+	vq->layout = virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ?
+		     PACKED_IN_ORDER : PACKED;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
 		vq->weak_barriers = false;
@@ -2290,9 +2628,39 @@ static const struct virtqueue_ops packed_ops = {
 	.reset = virtqueue_reset_packed,
 };
 
+static const struct virtqueue_ops split_in_order_ops = {
+	.add = virtqueue_add_split,
+	.get = virtqueue_get_buf_ctx_split_in_order,
+	.kick_prepare = virtqueue_kick_prepare_split,
+	.disable_cb = virtqueue_disable_cb_split,
+	.enable_cb_delayed = virtqueue_enable_cb_delayed_split,
+	.enable_cb_prepare = virtqueue_enable_cb_prepare_split,
+	.poll = virtqueue_poll_split,
+	.detach_unused_buf = virtqueue_detach_unused_buf_split,
+	.more_used = more_used_split_in_order,
+	.resize = virtqueue_resize_split,
+	.reset = virtqueue_reset_split,
+};
+
+static const struct virtqueue_ops packed_in_order_ops = {
+	.add = virtqueue_add_packed_in_order,
+	.get = virtqueue_get_buf_ctx_packed_in_order,
+	.kick_prepare = virtqueue_kick_prepare_packed,
+	.disable_cb = virtqueue_disable_cb_packed,
+	.enable_cb_delayed = virtqueue_enable_cb_delayed_packed,
+	.enable_cb_prepare = virtqueue_enable_cb_prepare_packed,
+	.poll = virtqueue_poll_packed,
+	.detach_unused_buf = virtqueue_detach_unused_buf_packed,
+	.more_used = more_used_packed_in_order,
+	.resize = virtqueue_resize_packed,
+	.reset = virtqueue_reset_packed,
+};
+
 static const struct virtqueue_ops *const all_ops[VQ_TYPE_MAX] = {
 	[SPLIT] = &split_ops,
-	[PACKED] = &packed_ops
+	[PACKED] = &packed_ops,
+	[SPLIT_IN_ORDER] = &split_in_order_ops,
+	[PACKED_IN_ORDER] = &packed_in_order_ops,
 };
 
 static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
@@ -2336,7 +2704,6 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
 /*
  * Generic functions and exported symbols.
  */
-
 #define VIRTQUEUE_CALL(vq, op, ...)					\
 	({								\
 	typeof(all_ops[SPLIT]->op(vq, ##__VA_ARGS__)) ret;		\
@@ -2347,6 +2714,12 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
 	case PACKED:							\
 		ret = all_ops[PACKED]->op(vq, ##__VA_ARGS__);		\
 		break;							\
+	case SPLIT_IN_ORDER:						\
+		ret = all_ops[SPLIT_IN_ORDER]->op(vq, ##__VA_ARGS__);	\
+		break;							\
+	case PACKED_IN_ORDER:						\
+		ret = all_ops[PACKED_IN_ORDER]->op(vq, ##__VA_ARGS__);	\
+		break;							\
 	default:							\
 		BUG();							\
 		break;							\
@@ -2363,6 +2736,12 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
         case PACKED:					  \
                 all_ops[PACKED]->op(vq, ##__VA_ARGS__);   \
                 break;					  \
+        case SPLIT_IN_ORDER:						\
+                all_ops[SPLIT_IN_ORDER]->op(vq, ##__VA_ARGS__);		\
+                break;							\
+        case PACKED_IN_ORDER:						\
+                all_ops[PACKED_IN_ORDER]->op(vq, ##__VA_ARGS__);	\
+                break;							\
         default:					  \
                 BUG();					  \
                 break;					  \
@@ -3073,6 +3452,8 @@ void vring_transport_features(struct virtio_device *vdev)
 			break;
 		case VIRTIO_F_NOTIFICATION_DATA:
 			break;
+		case VIRTIO_F_IN_ORDER:
+			break;
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 00/19] virtio_ring in order support
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (18 preceding siblings ...)
  2025-06-16  8:25 ` [PATCH V3 19/19] virtio_ring: add in order support Jason Wang
@ 2025-06-17 12:29 ` Eugenio Perez Martin
  2025-07-01  6:58 ` Michael S. Tsirkin
  20 siblings, 0 replies; 35+ messages in thread
From: Eugenio Perez Martin @ 2025-06-17 12:29 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, xuanzhuo, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 10:25 AM Jason Wang <jasowang@redhat.com> wrote:
>
> Hello all:
>
> This sereis tries to implement the VIRTIO_F_IN_ORDER to
> virtio_ring. This is done by introducing virtqueue ops so we can
> implement separate helpers for different virtqueue layout/features
> then the in-order were implemented on top.
>
> Tests shows 3%-5% imporvment with packed virtqueue PPS with KVM guest
> testpmd on the host.
>
> Changes since V2:
>
> - Fix build warning when DEBUG is enabled
>
> Changes since V1:
>
> - use const global array of function pointers to avoid indirect
>   branches to eliminate retpoline when mitigation is enabled
> - fix used length calculation when processing used ids in a batch
> - fix sparse warnings
>
> Please review.
>
> Thanks
>
> Jason Wang (19):
>   virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
>   virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants
>   virtio_ring: unify logic of virtqueue_poll() and more_used()
>   virtio_ring: switch to use vring_virtqueue for virtqueue resize
>     variants
>   virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare
>     variants
>   virtio_ring: switch to use vring_virtqueue for virtqueue_add variants
>   virtio: switch to use vring_virtqueue for virtqueue_add variants
>   virtio_ring: switch to use vring_virtqueue for enable_cb_prepare
>     variants
>   virtio_ring: use vring_virtqueue for enable_cb_delayed variants
>   virtio_ring: switch to use vring_virtqueue for disable_cb variants
>   virtio_ring: switch to use vring_virtqueue for detach_unused_buf
>     variants
>   virtio_ring: use u16 for last_used_idx in virtqueue_poll_split()
>   virtio_ring: introduce virtqueue ops
>   virtio_ring: determine descriptor flags at one time
>   virtio_ring: factor out core logic of buffer detaching
>   virtio_ring: factor out core logic for updating last_used_idx
>   virtio_ring: factor out split indirect detaching logic
>   virtio_ring: factor out split detaching logic
>   virtio_ring: add in order support
>
>  drivers/virtio/virtio_ring.c | 896 ++++++++++++++++++++++++++---------
>  1 file changed, 684 insertions(+), 212 deletions(-)
>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

Thanks!


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
  2025-06-16  8:24 ` [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx() Jason Wang
@ 2025-06-24 10:35   ` Lei Yang
  0 siblings, 0 replies; 35+ messages in thread
From: Lei Yang @ 2025-06-24 10:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, xuanzhuo, eperezma, virtualization, linux-kernel

I used the "virtio-net-pci,...,in_order=on" to test this series of
patches v3 with regression tests, everything works fine.

Tested-by: Lei Yang <leiyang@redhat.com>

On Mon, Jun 16, 2025 at 4:27 PM Jason Wang <jasowang@redhat.com> wrote:
>
> To be consistent with virtqueue_reset().
>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index b784aab66867..afdd51fc3c9c 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1005,7 +1005,7 @@ static void virtqueue_vring_init_split(struct vring_virtqueue_split *vring_split
>         }
>  }
>
> -static void virtqueue_reinit_split(struct vring_virtqueue *vq)
> +static void virtqueue_reset_split(struct vring_virtqueue *vq)
>  {
>         int num;
>
> @@ -1248,7 +1248,7 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
>  err_state_extra:
>         vring_free_split(&vring_split, vdev, vring_dma_dev(vq));
>  err:
> -       virtqueue_reinit_split(vq);
> +       virtqueue_reset_split(vq);
>         return -ENOMEM;
>  }
>
> @@ -2092,7 +2092,7 @@ static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq,
>         vq->free_head = 0;
>  }
>
> -static void virtqueue_reinit_packed(struct vring_virtqueue *vq)
> +static void virtqueue_reset_packed(struct vring_virtqueue *vq)
>  {
>         memset(vq->packed.vring.device, 0, vq->packed.event_size_in_bytes);
>         memset(vq->packed.vring.driver, 0, vq->packed.event_size_in_bytes);
> @@ -2219,7 +2219,7 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
>  err_state_extra:
>         vring_free_packed(&vring_packed, vdev, vring_dma_dev(vq));
>  err_ring:
> -       virtqueue_reinit_packed(vq);
> +       virtqueue_reset_packed(vq);
>         return -ENOMEM;
>  }
>
> @@ -2852,9 +2852,9 @@ int virtqueue_reset(struct virtqueue *_vq,
>                 recycle_done(_vq);
>
>         if (vq->packed_ring)
> -               virtqueue_reinit_packed(vq);
> +               virtqueue_reset_packed(vq);
>         else
> -               virtqueue_reinit_split(vq);
> +               virtqueue_reset_split(vq);
>
>         return virtqueue_enable_after_reset(_vq);
>  }
> --
> 2.34.1
>
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 13/19] virtio_ring: introduce virtqueue ops
  2025-06-16  8:25 ` [PATCH V3 13/19] virtio_ring: introduce virtqueue ops Jason Wang
@ 2025-07-01  6:28   ` Michael S. Tsirkin
  2025-07-03  3:20     ` Jason Wang
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-01  6:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 04:25:11PM +0800, Jason Wang wrote:
> This patch introduces virtqueue ops which is a set of the callbacks
> that will be called for different queue layout or features. This would
> help to avoid branches for split/packed and will ease the future
> implementation like in order.
> 
> Note that in order to eliminate the indirect calls this patch uses
> global array of const ops to allow compiler to avoid indirect
> branches.
> 
> Tested with CONFIG_MITIGATION_RETPOLINE, no performance differences
> were noticed.
> 
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 172 ++++++++++++++++++++++++++---------
>  1 file changed, 129 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index b14bbb4d6713..af32d1a1a1db 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -67,6 +67,12 @@
>  #define LAST_ADD_TIME_INVALID(vq)
>  #endif
>  
> +enum vq_layout {
> +	SPLIT = 0,
> +	PACKED,
> +	VQ_TYPE_MAX,
> +};
> +
>  struct vring_desc_state_split {
>  	void *data;			/* Data for callback. */
>  
> @@ -159,12 +165,28 @@ struct vring_virtqueue_packed {
>  	size_t event_size_in_bytes;
>  };
>  
> +struct vring_virtqueue;
> +
> +struct virtqueue_ops {
> +	int (*add)(struct vring_virtqueue *_vq, struct scatterlist *sgs[],
> +		   unsigned int total_sg, unsigned int out_sgs,
> +		   unsigned int in_sgs,	void *data,
> +		   void *ctx, bool premapped, gfp_t gfp);
> +	void *(*get)(struct vring_virtqueue *vq, unsigned int *len, void **ctx);
> +	bool (*kick_prepare)(struct vring_virtqueue *vq);
> +	void (*disable_cb)(struct vring_virtqueue *vq);
> +	bool (*enable_cb_delayed)(struct vring_virtqueue *vq);
> +	unsigned int (*enable_cb_prepare)(struct vring_virtqueue *vq);
> +	bool (*poll)(const struct vring_virtqueue *vq, u16 last_used_idx);
> +	void *(*detach_unused_buf)(struct vring_virtqueue *vq);
> +	bool (*more_used)(const struct vring_virtqueue *vq);
> +	int (*resize)(struct vring_virtqueue *vq, u32 num);
> +	void (*reset)(struct vring_virtqueue *vq);
> +};
> +
>  struct vring_virtqueue {
>  	struct virtqueue vq;
>  
> -	/* Is this a packed ring? */
> -	bool packed_ring;
> -
>  	/* Is DMA API used? */
>  	bool use_dma_api;
>  
> @@ -180,6 +202,8 @@ struct vring_virtqueue {
>  	/* Host publishes avail event idx */
>  	bool event;
>  
> +	enum vq_layout layout;
> +
>  	/* Head of free buffer list. */
>  	unsigned int free_head;
>  	/* Number we've added since last sync. */
> @@ -232,6 +256,12 @@ static void vring_free(struct virtqueue *_vq);
>  
>  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
>  
> +

Why two empty lines?

> +static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> +{
> +	return vq->layout == PACKED;
> +}
> +

So why still have this?
Let's replace all uses with a per layout
callback. None of them seem to be on the data path.
Well, with one exception - there is
	> @@ -2921,7 +3006,7 @@ u32 vring_notification_data(struct virtqueue *_vq)
	>       struct vring_virtqueue *vq = to_vvq(_vq);
	>       u16 next;
	>  
	> -     if (vq->packed_ring)
	> +     if (virtqueue_is_packed(vq))
	>               next = (vq->packed.next_avail_idx &
	>                               ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))) |
	>                       vq->packed.avail_wrap_counter <<

I think it's the only one? Are we trying to save a function call there?
I am fine to keep it as in this patch, or if we do change it,
maybe with a separate patch, so if there is a regression we can bisect
more easily.




For example, for the chunk below, we could have:
	.init_last_used

having said that, this patchset is already big, so let us do it with a separate patch -
but if so, the way to split would be in patch 1 to just leave vq->packed_ring
in place, gradually replace call sites, and in patch N drop
vq->packed_ring.


>  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
>  				   unsigned int total_sg)
>  {
> @@ -422,7 +452,7 @@ static void virtqueue_init(struct vring_virtqueue *vq, u32 num)
>  {
>  	vq->vq.num_free = num;
>  
> -	if (vq->packed_ring)
> +	if (virtqueue_is_packed(vq))
>  		vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR);
>  	else
>  		vq->last_used_idx = 0;
> @@ -1116,6 +1146,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
>  	return 0;
>  }
>  
> +static const struct virtqueue_ops split_ops;
> +
>  static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
>  					       struct vring_virtqueue_split *vring_split,
>  					       struct virtio_device *vdev,
> @@ -1133,7 +1165,7 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
>  	if (!vq)
>  		return NULL;
>  
> -	vq->packed_ring = false;
> +	vq->layout = SPLIT;
>  	vq->vq.callback = callback;
>  	vq->vq.vdev = vdev;
>  	vq->vq.name = name;
> @@ -2076,6 +2108,8 @@ static void virtqueue_reset_packed(struct vring_virtqueue *vq)
>  	virtqueue_vring_init_packed(&vq->packed, !!vq->vq.callback);
>  }
>  
> +static const struct virtqueue_ops packed_ops;
> +
>  static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
>  					       struct vring_virtqueue_packed *vring_packed,
>  					       struct virtio_device *vdev,
> @@ -2106,7 +2140,7 @@ static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
>  #else
>  	vq->broken = false;
>  #endif
> -	vq->packed_ring = true;
> +	vq->layout = PACKED;
>  	vq->dma_dev = dma_dev;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
>  
> @@ -2194,6 +2228,39 @@ static int virtqueue_resize_packed(struct vring_virtqueue *vq, u32 num)
>  	return -ENOMEM;
>  }
>  
> +static const struct virtqueue_ops split_ops = {
> +	.add = virtqueue_add_split,
> +	.get = virtqueue_get_buf_ctx_split,
> +	.kick_prepare = virtqueue_kick_prepare_split,
> +	.disable_cb = virtqueue_disable_cb_split,
> +	.enable_cb_delayed = virtqueue_enable_cb_delayed_split,
> +	.enable_cb_prepare = virtqueue_enable_cb_prepare_split,
> +	.poll = virtqueue_poll_split,
> +	.detach_unused_buf = virtqueue_detach_unused_buf_split,
> +	.more_used = more_used_split,
> +	.resize = virtqueue_resize_split,
> +	.reset = virtqueue_reset_split,
> +};
> +
> +static const struct virtqueue_ops packed_ops = {
> +	.add = virtqueue_add_packed,
> +	.get = virtqueue_get_buf_ctx_packed,
> +	.kick_prepare = virtqueue_kick_prepare_packed,
> +	.disable_cb = virtqueue_disable_cb_packed,
> +	.enable_cb_delayed = virtqueue_enable_cb_delayed_packed,
> +	.enable_cb_prepare = virtqueue_enable_cb_prepare_packed,
> +	.poll = virtqueue_poll_packed,
> +	.detach_unused_buf = virtqueue_detach_unused_buf_packed,
> +	.more_used = more_used_packed,
> +	.resize = virtqueue_resize_packed,
> +	.reset = virtqueue_reset_packed,
> +};
> +
> +static const struct virtqueue_ops *const all_ops[VQ_TYPE_MAX] = {
> +	[SPLIT] = &split_ops,
> +	[PACKED] = &packed_ops
> +};
> +
>  static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
>  					 void (*recycle)(struct virtqueue *vq, void *buf))
>  {
> @@ -2236,6 +2303,38 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
>   * Generic functions and exported symbols.
>   */
>  
> +#define VIRTQUEUE_CALL(vq, op, ...)					\
> +	({								\
> +	typeof(all_ops[SPLIT]->op(vq, ##__VA_ARGS__)) ret;		\

we need an empty line here, after the declaration.

> +	switch (vq->layout) {						\
> +	case SPLIT:							\
> +		ret = all_ops[SPLIT]->op(vq, ##__VA_ARGS__);		\
> +		break;							\
> +	case PACKED:							\
> +		ret = all_ops[PACKED]->op(vq, ##__VA_ARGS__);		\
> +		break;							\
> +	default:							\
> +		BUG();							\
> +		break;							\
> +	}								\
> +	ret;								\
> +})
> +
> +#define VOID_VIRTQUEUE_CALL(vq, op, ...)		  \
> +	({						  \
> +        switch ((vq)->layout) {				  \
> +        case SPLIT:					  \
> +                all_ops[SPLIT]->op(vq, ##__VA_ARGS__);    \
> +                break;					  \
> +        case PACKED:					  \
> +                all_ops[PACKED]->op(vq, ##__VA_ARGS__);   \
> +                break;					  \
> +        default:					  \
> +                BUG();					  \
> +                break;					  \
> +        }						  \
> +})
> +
>  static inline int virtqueue_add(struct virtqueue *_vq,
>  				struct scatterlist *sgs[],
>  				unsigned int total_sg,
> @@ -2248,10 +2347,9 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_add_packed(vq, sgs, total_sg,
> -					out_sgs, in_sgs, data, ctx, premapped, gfp) :
> -				 virtqueue_add_split(vq, sgs, total_sg,
> -					out_sgs, in_sgs, data, ctx, premapped, gfp);
> +	return VIRTQUEUE_CALL(vq, add, sgs, total_sg,
> +			      out_sgs, in_sgs, data,
> +			      ctx, premapped, gfp);
>  }
>  
>  /**
> @@ -2437,8 +2535,7 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_kick_prepare_packed(vq) :
> -				 virtqueue_kick_prepare_split(vq);
> +	return VIRTQUEUE_CALL(vq, kick_prepare);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
>  
> @@ -2508,8 +2605,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(vq, len, ctx) :
> -				 virtqueue_get_buf_ctx_split(vq, len, ctx);
> +	return VIRTQUEUE_CALL(vq, get, len, ctx);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>  
> @@ -2531,10 +2627,7 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	if (vq->packed_ring)
> -		virtqueue_disable_cb_packed(vq);
> -	else
> -		virtqueue_disable_cb_split(vq);
> +	VOID_VIRTQUEUE_CALL(vq, disable_cb);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>  
> @@ -2557,8 +2650,7 @@ unsigned int virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  	if (vq->event_triggered)
>  		vq->event_triggered = false;
>  
> -	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(vq) :
> -				 virtqueue_enable_cb_prepare_split(vq);
> +	return VIRTQUEUE_CALL(vq, enable_cb_prepare);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>  
> @@ -2579,8 +2671,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned int last_used_idx)
>  		return false;
>  
>  	virtio_mb(vq->weak_barriers);
> -	return vq->packed_ring ? virtqueue_poll_packed(vq, last_used_idx) :
> -				 virtqueue_poll_split(vq, last_used_idx);
> +
> +	return VIRTQUEUE_CALL(vq, poll, last_used_idx);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_poll);
>  
> @@ -2623,8 +2715,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>  	if (vq->event_triggered)
>  		data_race(vq->event_triggered = false);
>  
> -	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(vq) :
> -				 virtqueue_enable_cb_delayed_split(vq);
> +	return VIRTQUEUE_CALL(vq, enable_cb_delayed);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
>  
> @@ -2640,14 +2731,13 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(vq) :
> -				 virtqueue_detach_unused_buf_split(vq);
> +	return VIRTQUEUE_CALL(vq, detach_unused_buf);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
> -	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> +	return VIRTQUEUE_CALL(vq, more_used);
>  }
>  
>  /**
> @@ -2776,7 +2866,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
>  	if (!num)
>  		return -EINVAL;
>  
> -	if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num)
> +	if ((virtqueue_is_packed(vq) ? vq->packed.vring.num :
> +			               vq->split.vring.num) == num)
>  		return 0;
>  
>  	err = virtqueue_disable_and_recycle(_vq, recycle);
> @@ -2785,10 +2876,7 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
>  	if (recycle_done)
>  		recycle_done(_vq);
>  
> -	if (vq->packed_ring)
> -		err = virtqueue_resize_packed(vq, num);
> -	else
> -		err = virtqueue_resize_split(vq, num);
> +	err = VIRTQUEUE_CALL(vq, resize, num);
>  
>  	return virtqueue_enable_after_reset(_vq);
>  }
> @@ -2822,10 +2910,7 @@ int virtqueue_reset(struct virtqueue *_vq,
>  	if (recycle_done)
>  		recycle_done(_vq);
>  
> -	if (vq->packed_ring)
> -		virtqueue_reset_packed(vq);
> -	else
> -		virtqueue_reset_split(vq);
> +	VOID_VIRTQUEUE_CALL(vq, reset);
>  
>  	return virtqueue_enable_after_reset(_vq);
>  }
> @@ -2867,7 +2952,7 @@ static void vring_free(struct virtqueue *_vq)
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	if (vq->we_own_ring) {
> -		if (vq->packed_ring) {
> +		if (virtqueue_is_packed(vq)) {
>  			vring_free_queue(vq->vq.vdev,
>  					 vq->packed.ring_size_in_bytes,
>  					 vq->packed.vring.desc,
> @@ -2896,7 +2981,7 @@ static void vring_free(struct virtqueue *_vq)
>  					 vring_dma_dev(vq));
>  		}
>  	}
> -	if (!vq->packed_ring) {
> +	if (!virtqueue_is_packed(vq)) {
>  		kfree(vq->split.desc_state);
>  		kfree(vq->split.desc_extra);
>  	}
> @@ -2921,7 +3006,7 @@ u32 vring_notification_data(struct virtqueue *_vq)
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	u16 next;
>  
> -	if (vq->packed_ring)
> +	if (virtqueue_is_packed(vq))
>  		next = (vq->packed.next_avail_idx &
>  				~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))) |
>  			vq->packed.avail_wrap_counter <<
> @@ -2974,7 +3059,8 @@ unsigned int virtqueue_get_vring_size(const struct virtqueue *_vq)
>  
>  	const struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
> +	return virtqueue_is_packed(vq) ? vq->packed.vring.num :
> +				      vq->split.vring.num;
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>  
> @@ -3057,7 +3143,7 @@ dma_addr_t virtqueue_get_desc_addr(const struct virtqueue *_vq)
>  
>  	BUG_ON(!vq->we_own_ring);
>  
> -	if (vq->packed_ring)
> +	if (virtqueue_is_packed(vq))
>  		return vq->packed.ring_dma_addr;
>  
>  	return vq->split.queue_dma_addr;
> @@ -3070,7 +3156,7 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue *_vq)
>  
>  	BUG_ON(!vq->we_own_ring);
>  
> -	if (vq->packed_ring)
> +	if (virtqueue_is_packed(vq))
>  		return vq->packed.driver_event_dma_addr;
>  
>  	return vq->split.queue_dma_addr +
> @@ -3084,7 +3170,7 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *_vq)
>  
>  	BUG_ON(!vq->we_own_ring);
>  
> -	if (vq->packed_ring)
> +	if (virtqueue_is_packed(vq))
>  		return vq->packed.device_event_dma_addr;
>  
>  	return vq->split.queue_dma_addr +
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time
  2025-06-16  8:25 ` [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time Jason Wang
@ 2025-07-01  6:42   ` Michael S. Tsirkin
  2025-07-01  7:25     ` Jason Wang
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-01  6:42 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 04:25:12PM +0800, Jason Wang wrote:
> Let's determine the last descriptor by counting the number of sg. This
> would be consistent with packed virtqueue implementation and ease the
> future in-order implementation.
> 
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index af32d1a1a1db..d5e4d4cd2487 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -570,7 +570,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>  	struct vring_desc_extra *extra;
>  	struct scatterlist *sg;
>  	struct vring_desc *desc;
> -	unsigned int i, n, avail, descs_used, prev, err_idx;
> +	unsigned int i, n, c, avail, descs_used, err_idx;
>  	int head;
>  	bool indirect;
>  
> @@ -626,46 +626,47 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>  		return -ENOSPC;
>  	}
>  
> +	c = 0;

initialize at point of declaration?

>  	for (n = 0; n < out_sgs; n++) {
> +		sg = sgs[n];
>  		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>  			dma_addr_t addr;
>  			u32 len;
> +			u16 flags = 0;
>  
>  			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr, &len, premapped))
>  				goto unmap_release;
>  
> -			prev = i;
> +			if (++c != total_sg)
> +				flags = VRING_DESC_F_NEXT;
> +

Don't like it how the logic is split.
flags isn't used before that.
So I prefer:

	flags = ++c == total_sg ? 0 : VRING_DESC_F_NEXT;

and at that point, we do not really need flags anymore:


>  			/* Note that we trust indirect descriptor
>  			 * table since it use stream DMA mapping.
>  			 */
>  			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
> -						     VRING_DESC_F_NEXT,
> +						     flags,

So just:


	i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
				     ++c == total_sg ? 0 : VRING_DESC_F_NEXT,

here and we are done.


>  						     premapped);
>  		}
>  	}
>  	for (; n < (out_sgs + in_sgs); n++) {
>  		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			u16 flags = VRING_DESC_F_WRITE;
>  			dma_addr_t addr;
>  			u32 len;
>  
>  			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr, &len, premapped))
>  				goto unmap_release;
>  
> -			prev = i;
> +			if (++c != total_sg)
> +				flags |= VRING_DESC_F_NEXT;
> +

Don't like it that above it's "=" here it is "|=".
And flags isn't used before that.
So I prefer:

	flags = ++c == total_sg ? VRING_DESC_F_WRITE : VRING_DESC_F_WRITE | VRING_DESC_F_NEXT;

and again we don't really need the variable:


>  			/* Note that we trust indirect descriptor
>  			 * table since it use stream DMA mapping.
>  			 */
>  			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
> -						     VRING_DESC_F_NEXT |
> -						     VRING_DESC_F_WRITE,


so just:

		i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
					     (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
					     VRING_DESC_F_WRITE,


is clearer, and the patch smaller.



> -						     premapped);
> +						     flags, premapped);
>  		}
>  	}
> -	/* Last one doesn't continue. */
> -	desc[prev].flags &= cpu_to_virtio16(vq->vq.vdev, ~VRING_DESC_F_NEXT);
> -	if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
> -		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> -			~VRING_DESC_F_NEXT;
>  
>  	if (indirect) {
>  		/* Now that the indirect table is filled in, map it. */
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 17/19] virtio_ring: factor out split indirect detaching logic
  2025-06-16  8:25 ` [PATCH V3 17/19] virtio_ring: factor out split indirect detaching logic Jason Wang
@ 2025-07-01  6:44   ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-01  6:44 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 04:25:15PM +0800, Jason Wang wrote:
> Factor out the split indirect descriptor detaching logic in order to
> make it be reused by the in order support.

allow it to be reused

> 
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++----------------
>  1 file changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e8e0d1204f52..259380797ec4 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -773,11 +773,42 @@ static bool virtqueue_kick_prepare_split(struct vring_virtqueue *vq)
>  	return needs_kick;
>  }
>  
> +static void detach_indirect_split(struct vring_virtqueue *vq,
> +				  unsigned int head)
> +{
> +	struct vring_desc_extra *extra = vq->split.desc_extra;
> +	struct vring_desc *indir_desc =
> +	       vq->split.desc_state[head].indir_desc;
> +	unsigned int j;
> +	u32 len, num;
> +
> +	/* Free the indirect table, if any, now that it's unmapped. */
> +	if (!indir_desc)
> +		return;
> +	len = vq->split.desc_extra[head].len;
> +
> +	BUG_ON(!(vq->split.desc_extra[head].flags &
> +			VRING_DESC_F_INDIRECT));
> +	BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> +
> +	num = len / sizeof(struct vring_desc);
> +
> +	extra = (struct vring_desc_extra *)&indir_desc[num];
> +
> +	if (vq->use_dma_api) {
> +		for (j = 0; j < num; j++)
> +			vring_unmap_one_split(vq, &extra[j]);
> +	}
> +
> +	kfree(indir_desc);
> +	vq->split.desc_state[head].indir_desc = NULL;
> +}
> +
>  static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  			     void **ctx)
>  {
>  	struct vring_desc_extra *extra;
> -	unsigned int i, j;
> +	unsigned int i;
>  	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>  
>  	/* Clear data ptr. */
> @@ -801,34 +832,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	/* Plus final descriptor */
>  	vq->vq.num_free++;
>  
> -	if (vq->indirect) {
> -		struct vring_desc *indir_desc =
> -				vq->split.desc_state[head].indir_desc;
> -		u32 len, num;
> -
> -		/* Free the indirect table, if any, now that it's unmapped. */
> -		if (!indir_desc)
> -			return;
> -		len = vq->split.desc_extra[head].len;
> -
> -		BUG_ON(!(vq->split.desc_extra[head].flags &
> -				VRING_DESC_F_INDIRECT));
> -		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> -
> -		num = len / sizeof(struct vring_desc);
> -
> -		extra = (struct vring_desc_extra *)&indir_desc[num];
> -
> -		if (vq->use_dma_api) {
> -			for (j = 0; j < num; j++)
> -				vring_unmap_one_split(vq, &extra[j]);
> -		}
> -
> -		kfree(indir_desc);
> -		vq->split.desc_state[head].indir_desc = NULL;
> -	} else if (ctx) {
> +	if (vq->indirect)
> +		detach_indirect_split(vq, head);
> +	else if (ctx)
>  		*ctx = vq->split.desc_state[head].indir_desc;
> -	}
>  }
>  
>  static bool virtqueue_poll_split(const struct vring_virtqueue *vq,
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 18/19] virtio_ring: factor out split detaching logic
  2025-06-16  8:25 ` [PATCH V3 18/19] virtio_ring: factor out split " Jason Wang
@ 2025-07-01  6:45   ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-01  6:45 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 04:25:16PM +0800, Jason Wang wrote:
> This patch factors out the split core detaching logic that could be
> reused by in order feature into a dedicated function.

you mean 

Factor out the split core detaching logic  into a dedicated function
so it cn be reused by the in order feature


> 
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 259380797ec4..27a9459a0555 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -804,8 +804,9 @@ static void detach_indirect_split(struct vring_virtqueue *vq,
>  	vq->split.desc_state[head].indir_desc = NULL;
>  }
>  
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +static unsigned detach_buf_split_in_order(struct vring_virtqueue *vq,
> +					  unsigned int head,
> +					  void **ctx)
>  {
>  	struct vring_desc_extra *extra;
>  	unsigned int i;
> @@ -826,8 +827,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	}
>  
>  	vring_unmap_one_split(vq, &extra[i]);
> -	vq->split.desc_extra[i].next = vq->free_head;
> -	vq->free_head = head;
>  
>  	/* Plus final descriptor */
>  	vq->vq.num_free++;
> @@ -836,6 +835,17 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  		detach_indirect_split(vq, head);
>  	else if (ctx)
>  		*ctx = vq->split.desc_state[head].indir_desc;
> +
> +	return i;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			     void **ctx)
> +{
> +	unsigned int i = detach_buf_split_in_order(vq, head, ctx);
> +
> +	vq->split.desc_extra[i].next = vq->free_head;
> +	vq->free_head = head;
>  }
>  
>  static bool virtqueue_poll_split(const struct vring_virtqueue *vq,
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 19/19] virtio_ring: add in order support
  2025-06-16  8:25 ` [PATCH V3 19/19] virtio_ring: add in order support Jason Wang
@ 2025-07-01  6:56   ` Michael S. Tsirkin
  2025-07-02  9:29     ` Jason Wang
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-01  6:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 04:25:17PM +0800, Jason Wang wrote:
> This patch implements in order support for both split virtqueue and
> packed virtqueue.

I'd like to see more motivation for this work, documented.
It's not really performance, not as it stands, see below:

> 
> Benchmark with KVM guest + testpmd on the host shows:
> 
> For split virtqueue: no obvious differences were noticed
> 
> For packed virtqueue:
> 
> 1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
> 2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps
>

That's a very modest improvement for a lot of code.
I also note you put in some batching just for in-order.
Which could also explain the gains maybe?
What if you just put in a simple implementation with no
batching tricks? do you still see a gain?
Does any hardware implement this? Maybe that can demonstrate
bigger gains.

 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
>  1 file changed, 402 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 27a9459a0555..21d456392ba0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -70,11 +70,14 @@
>  enum vq_layout {
>  	SPLIT = 0,
>  	PACKED,
> +	SPLIT_IN_ORDER,
> +	PACKED_IN_ORDER,
>  	VQ_TYPE_MAX,
>  };
>  
>  struct vring_desc_state_split {
>  	void *data;			/* Data for callback. */
> +	u32 total_len;			/* Buffer Length */
>  
>  	/* Indirect desc table and extra table, if any. These two will be
>  	 * allocated together. So we won't stress more to the memory allocator.
> @@ -84,6 +87,7 @@ struct vring_desc_state_split {
>  
>  struct vring_desc_state_packed {
>  	void *data;			/* Data for callback. */
> +	u32 total_len;			/* Buffer Length */
>  
>  	/* Indirect desc table and extra table, if any. These two will be
>  	 * allocated together. So we won't stress more to the memory allocator.

We are bloating up the cache footprint for everyone,
so there's a chance of regressions.
Pls include benchmark for in order off, to make sure we
are not regressing.
How big was the ring?
Worth trying with a biggish one, where there is more cache
pressure.


Why not have a separate state for in-order?



> @@ -206,6 +210,12 @@ struct vring_virtqueue {
>  
>  	/* Head of free buffer list. */
>  	unsigned int free_head;
> +
> +	/* Head of the batched used buffers, vq->num means no batching */
> +	unsigned int batch_head;
> +
> +	unsigned int batch_len;
> +

Are these two only used for in-order? Please document that.
I also want some documentation about the batching trickery
used please.
What is batched, when, how is batching flushed, why are we
only batching in-order ...




>  	/* Number we've added since last sync. */
>  	unsigned int num_added;
>  
> @@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
>  
>  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
>  
> -
>  static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
>  {
> -	return vq->layout == PACKED;
> +	return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
> +}
> +
> +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> +{
> +	return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
>  }
>  
>  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> @@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>  	struct vring_desc_extra *extra;
>  	struct scatterlist *sg;
>  	struct vring_desc *desc;
> -	unsigned int i, n, c, avail, descs_used, err_idx;
> +	unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;


I would add a comment here:

/* Total length for in-order */
unsigned int total_len = 0;

>  	int head;
>  	bool indirect;
>  
> @@ -646,6 +660,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>  			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
>  						     flags,
>  						     premapped);
> +			total_len += len;
>  		}
>  	}
>  	for (; n < (out_sgs + in_sgs); n++) {




> @@ -665,6 +680,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>  			 */
>  			i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
>  						     flags, premapped);
> +			total_len += len;
>  		}
>  	}
>  
> @@ -687,7 +703,12 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>  	vq->vq.num_free -= descs_used;
>  
>  	/* Update free pointer */
> -	if (indirect)
> +	if (virtqueue_is_in_order(vq)) {
> +		vq->free_head += descs_used;
> +		if (vq->free_head >= vq->split.vring.num)
> +			vq->free_head -= vq->split.vring.num;
> +		vq->split.desc_state[head].total_len = total_len;;
> +	} else if (indirect)
>  		vq->free_head = vq->split.desc_extra[head].next;
>  	else
>  		vq->free_head = i;
> @@ -860,6 +881,14 @@ static bool more_used_split(const struct vring_virtqueue *vq)
>  	return virtqueue_poll_split(vq, vq->last_used_idx);
>  }
>  
> +static bool more_used_split_in_order(const struct vring_virtqueue *vq)
> +{
> +	if (vq->batch_head != vq->packed.vring.num)
> +		return true;
> +
> +	return virtqueue_poll_split(vq, vq->last_used_idx);
> +}
> +
>  static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq,
>  					 unsigned int *len,
>  					 void **ctx)
> @@ -917,6 +946,73 @@ static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq,
>  	return ret;
>  }
>  
> +static void *virtqueue_get_buf_ctx_split_in_order(struct vring_virtqueue *vq,
> +						  unsigned int *len,
> +						  void **ctx)
> +{
> +	void *ret;
> +	unsigned int num = vq->split.vring.num;
> +	u16 last_used;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	last_used = (vq->last_used_idx & (vq->split.vring.num - 1));
> +
> +	if (vq->batch_head == num) {
> +		if (!more_used_split(vq)) {
> +			pr_debug("No more buffers in queue\n");
> +			END_USE(vq);
> +			return NULL;
> +		}
> +
> +		/* Only get used array entries after they have been
> +		 * exposed by host. */
> +		virtio_rmb(vq->weak_barriers);
> +		vq->batch_head = virtio32_to_cpu(vq->vq.vdev,
> +				 vq->split.vring.used->ring[last_used].id);
> +		vq->batch_len = virtio32_to_cpu(vq->vq.vdev,
> +				vq->split.vring.used->ring[last_used].len);
> +	}
> +
> +	if (vq->batch_head == last_used) {
> +		vq->batch_head = num;
> +		*len = vq->batch_len;
> +	} else
> +		*len = vq->split.desc_state[last_used].total_len;
> +
> +	if (unlikely(last_used >= num)) {
> +		BAD_RING(vq, "id %u out of range\n", last_used);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->split.desc_state[last_used].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", last_used);
> +		return NULL;
> +	}
> +
> +	/* detach_buf_split clears data, so grab it now. */
> +	ret = vq->split.desc_state[last_used].data;
> +	detach_buf_split_in_order(vq, last_used, ctx);
> +
> +	vq->last_used_idx++;
> +	/* If we expect an interrupt for the next entry, tell host
> +	 * by writing event index and flush out the write before
> +	 * the read in the next get_buf call. */
> +	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> +		virtio_store_mb(vq->weak_barriers,
> +				&vring_used_event(&vq->split.vring),
> +				cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx));
> +
> +	LAST_ADD_TIME_INVALID(vq);
> +
> +	END_USE(vq);
> +	return ret;
> +}
> +
>  static void virtqueue_disable_cb_split(struct vring_virtqueue *vq)
>  {
>  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> @@ -1010,7 +1106,10 @@ static void *virtqueue_detach_unused_buf_split(struct vring_virtqueue *vq)
>  			continue;
>  		/* detach_buf_split clears data, so grab it now. */
>  		buf = vq->split.desc_state[i].data;
> -		detach_buf_split(vq, i, NULL);
> +		if (virtqueue_is_in_order(vq))
> +			detach_buf_split_in_order(vq, i, NULL);
> +		else
> +			detach_buf_split(vq, i, NULL);
>  		vq->split.avail_idx_shadow--;
>  		vq->split.vring.avail->idx = cpu_to_virtio16(vq->vq.vdev,
>  				vq->split.avail_idx_shadow);
> @@ -1073,6 +1172,7 @@ static void virtqueue_vring_attach_split(struct vring_virtqueue *vq,
>  
>  	/* Put everything in free lists. */
>  	vq->free_head = 0;
> +	vq->batch_head = vq->split.vring.num;
>  }
>  
>  static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring_split)
> @@ -1183,7 +1283,6 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
>  	if (!vq)
>  		return NULL;
>  
> -	vq->layout = SPLIT;
>  	vq->vq.callback = callback;
>  	vq->vq.vdev = vdev;
>  	vq->vq.name = name;
> @@ -1203,6 +1302,8 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>  		!context;
>  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> +	vq->layout = virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ?
> +		     SPLIT_IN_ORDER : SPLIT;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
>  		vq->weak_barriers = false;
> @@ -1366,13 +1467,14 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  					 unsigned int in_sgs,
>  					 void *data,
>  					 bool premapped,
> -					 gfp_t gfp)
> +					 gfp_t gfp,
> +					 u16 id)
>  {
>  	struct vring_desc_extra *extra;
>  	struct vring_packed_desc *desc;
>  	struct scatterlist *sg;
> -	unsigned int i, n, err_idx, len;
> -	u16 head, id;
> +	unsigned int i, n, err_idx, len, total_len = 0;
> +	u16 head;
>  	dma_addr_t addr;
>  
>  	head = vq->packed.next_avail_idx;
> @@ -1390,8 +1492,6 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  	}
>  
>  	i = 0;
> -	id = vq->free_head;
> -	BUG_ON(id == vq->packed.vring.num);
>  
>  	for (n = 0; n < out_sgs + in_sgs; n++) {
>  		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> @@ -1411,6 +1511,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  				extra[i].flags = n < out_sgs ?  0 : VRING_DESC_F_WRITE;
>  			}
>  
> +			total_len += len;
>  			i++;
>  		}
>  	}
> @@ -1464,6 +1565,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  	vq->packed.desc_state[id].data = data;
>  	vq->packed.desc_state[id].indir_desc = desc;
>  	vq->packed.desc_state[id].last = id;
> +	vq->packed.desc_state[id].total_len = total_len;
>  
>  	vq->num_added += 1;
>  
> @@ -1516,8 +1618,11 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
>  	BUG_ON(total_sg == 0);
>  
>  	if (virtqueue_use_indirect(vq, total_sg)) {
> +		id = vq->free_head;
> +		BUG_ON(id == vq->packed.vring.num);
>  		err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
> -						    in_sgs, data, premapped, gfp);
> +						    in_sgs, data, premapped,
> +						    gfp, id);
>  		if (err != -ENOMEM) {
>  			END_USE(vq);
>  			return err;
> @@ -1638,6 +1743,152 @@ static inline int virtqueue_add_packed(struct vring_virtqueue *vq,
>  	return -EIO;
>  }
>  
> +static inline int virtqueue_add_packed_in_order(struct vring_virtqueue *vq,
> +						struct scatterlist *sgs[],
> +						unsigned int total_sg,
> +						unsigned int out_sgs,
> +						unsigned int in_sgs,
> +						void *data,
> +						void *ctx,
> +						bool premapped,
> +						gfp_t gfp)
> +{
> +	struct vring_packed_desc *desc;
> +	struct scatterlist *sg;
> +	unsigned int i, n, c, err_idx, total_len = 0;
> +	__le16 head_flags, flags;
> +	u16 head, avail_used_flags;
> +	int err;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +	BUG_ON(ctx && vq->indirect);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return -EIO;
> +	}
> +
> +	LAST_ADD_TIME_UPDATE(vq);
> +
> +	BUG_ON(total_sg == 0);
> +
> +	if (virtqueue_use_indirect(vq, total_sg)) {
> +		err = virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs,
> +						    in_sgs, data, premapped, gfp,
> +						    vq->packed.next_avail_idx);
> +		if (err != -ENOMEM) {
> +			END_USE(vq);
> +			return err;
> +		}
> +
> +		/* fall back on direct */
> +	}
> +
> +	head = vq->packed.next_avail_idx;
> +	avail_used_flags = vq->packed.avail_used_flags;
> +
> +	WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect);
> +
> +	desc = vq->packed.vring.desc;
> +	i = head;
> +
> +	if (unlikely(vq->vq.num_free < total_sg)) {
> +		pr_debug("Can't add buf len %i - avail = %i\n",
> +			 total_sg, vq->vq.num_free);
> +		END_USE(vq);
> +		return -ENOSPC;
> +	}
> +
> +	c = 0;
> +	for (n = 0; n < out_sgs + in_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr;
> +			u32 len;
> +
> +			if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +					     DMA_TO_DEVICE : DMA_FROM_DEVICE,
> +					     &addr, &len, premapped))
> +				goto unmap_release;
> +
> +			flags = cpu_to_le16(vq->packed.avail_used_flags |
> +				    (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> +				    (n < out_sgs ? 0 : VRING_DESC_F_WRITE));
> +			if (i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +
> +			desc[i].addr = cpu_to_le64(addr);
> +			desc[i].len = cpu_to_le32(len);
> +			desc[i].id = cpu_to_le16(head);
> +
> +			if (unlikely(vq->use_dma_api)) {
> +				vq->packed.desc_extra[i].addr = premapped ?
> +				      DMA_MAPPING_ERROR: addr;
> +				vq->packed.desc_extra[i].len = len;
> +				vq->packed.desc_extra[i].flags =
> +					le16_to_cpu(flags);
> +			}
> +
> +			if ((unlikely(++i >= vq->packed.vring.num))) {
> +				i = 0;
> +				vq->packed.avail_used_flags ^=
> +					1 << VRING_PACKED_DESC_F_AVAIL |
> +					1 << VRING_PACKED_DESC_F_USED;
> +				vq->packed.avail_wrap_counter ^= 1;
> +			}
> +
> +			total_len += len;
> +		}
> +	}
> +
> +	/* We're using some buffers from the free list. */
> +	vq->vq.num_free -= total_sg;
> +
> +	/* Update free pointer */
> +	vq->packed.next_avail_idx = i;
> +
> +	/* Store token. */
> +	vq->packed.desc_state[head].num = total_sg;
> +	vq->packed.desc_state[head].data = data;
> +	vq->packed.desc_state[head].indir_desc = ctx;
> +	vq->packed.desc_state[head].total_len = total_len;
> +
> +	/*
> +	 * A driver MUST NOT make the first descriptor in the list
> +	 * available before all subsequent descriptors comprising
> +	 * the list are made available.
> +	 */
> +	virtio_wmb(vq->weak_barriers);
> +	vq->packed.vring.desc[head].flags = head_flags;
> +	vq->num_added += total_sg;
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +
> +	return 0;
> +
> +unmap_release:
> +	err_idx = i;
> +	i = head;
> +	vq->packed.avail_used_flags = avail_used_flags;
> +
> +	for (n = 0; n < total_sg; n++) {
> +		if (i == err_idx)
> +			break;
> +		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[i]);
> +		i++;
> +		if (i >= vq->packed.vring.num)
> +			i = 0;
> +	}
> +
> +	END_USE(vq);
> +	return -EIO;
> +}
> +
>  static bool virtqueue_kick_prepare_packed(struct vring_virtqueue *vq)
>  {
>  	u16 new, old, off_wrap, flags, wrap_counter, event_idx;
> @@ -1758,7 +2009,7 @@ static inline bool is_used_desc_packed(const struct vring_virtqueue *vq,
>  	return avail == used && used == used_wrap_counter;
>  }
>  
> -static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
> +static bool __virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
>  {
>  	bool wrap_counter;
>  	u16 used_idx;
> @@ -1769,6 +2020,11 @@ static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap
>  	return is_used_desc_packed(vq, used_idx, wrap_counter);
>  }
>  
> +static bool virtqueue_poll_packed(const struct vring_virtqueue *vq, u16 off_wrap)
> +{
> +	return __virtqueue_poll_packed(vq, off_wrap);
> +}
> +
>  static bool more_used_packed(const struct vring_virtqueue *vq)
>  {
>  	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
> @@ -1798,10 +2054,84 @@ static void update_last_used_idx_packed(struct vring_virtqueue *vq,
>  				cpu_to_le16(vq->last_used_idx));
>  }
>  
> +static bool more_used_packed_in_order(const struct vring_virtqueue *vq)
> +{
> +	if (vq->batch_head != vq->packed.vring.num)
> +		return true;
> +
> +	return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
> +}
> +
> +static bool __more_used_packed(const struct vring_virtqueue *vq)
> +{
> +	return __virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx));
> +}
> +
> +static void *virtqueue_get_buf_ctx_packed_in_order(struct vring_virtqueue *vq,
> +						   unsigned int *len,
> +						   void **ctx)
> +{
> +	unsigned int num = vq->packed.vring.num;
> +	u16 last_used, id, last_used_idx;
> +	bool used_wrap_counter;
> +	void *ret;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	last_used_idx = vq->last_used_idx;
> +	used_wrap_counter = packed_used_wrap_counter(last_used_idx);
> +	last_used = packed_last_used(last_used_idx);
> +
> +	if (vq->batch_head == num) {
> +		if (!__more_used_packed(vq)) {
> +			pr_debug("No more buffers in queue\n");
> +			END_USE(vq);
> +			return NULL;
> +		}
> +		/* Only get used elements after they have been exposed by host. */
> +		virtio_rmb(vq->weak_barriers);
> +		vq->batch_head = le16_to_cpu(vq->packed.vring.desc[last_used].id);
> +		vq->batch_len = le32_to_cpu(vq->packed.vring.desc[last_used].len);
> +	}
> +
> +	if (vq->batch_head == last_used) {
> +		vq->batch_head = num;
> +		*len = vq->batch_len;
> +	} else
> +		*len = vq->packed.desc_state[last_used].total_len;
> +
> +	if (unlikely(last_used >= num)) {
> +		BAD_RING(vq, "id %u out of range\n", id);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->packed.desc_state[last_used].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", id);
> +		return NULL;
> +	}
> +
> +	/* detach_buf_packed clears data, so grab it now. */
> +	ret = vq->packed.desc_state[last_used].data;
> +	detach_buf_packed_in_order(vq, last_used, ctx);
> +
> +	update_last_used_idx_packed(vq, last_used, last_used,
> +				    used_wrap_counter);
> +
> +	LAST_ADD_TIME_INVALID(vq);
> +
> +	END_USE(vq);
> +	return ret;
> +}
> +
>  static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
>  					  unsigned int *len,
>  					  void **ctx)
>  {
> +	unsigned int num = vq->packed.vring.num;
>  	u16 last_used, id, last_used_idx;
>  	bool used_wrap_counter;
>  	void *ret;
> @@ -1813,7 +2143,7 @@ static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
>  		return NULL;
>  	}
>  
> -	if (!more_used_packed(vq)) {
> +	if (!__more_used_packed(vq)) {
>  		pr_debug("No more buffers in queue\n");
>  		END_USE(vq);
>  		return NULL;
> @@ -1828,7 +2158,7 @@ static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq,
>  	id = le16_to_cpu(vq->packed.vring.desc[last_used].id);
>  	*len = le32_to_cpu(vq->packed.vring.desc[last_used].len);
>  
> -	if (unlikely(id >= vq->packed.vring.num)) {
> +	if (unlikely(id >= num)) {
>  		BAD_RING(vq, "id %u out of range\n", id);
>  		return NULL;
>  	}
> @@ -1948,6 +2278,7 @@ static bool virtqueue_enable_cb_delayed_packed(struct vring_virtqueue *vq)
>  	last_used_idx = READ_ONCE(vq->last_used_idx);
>  	wrap_counter = packed_used_wrap_counter(last_used_idx);
>  	used_idx = packed_last_used(last_used_idx);
> +
>  	if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
>  		END_USE(vq);
>  		return false;
> @@ -1969,7 +2300,7 @@ static void *virtqueue_detach_unused_buf_packed(struct vring_virtqueue *vq)
>  			continue;
>  		/* detach_buf clears data, so grab it now. */
>  		buf = vq->packed.desc_state[i].data;
> -		detach_buf_packed(vq, i, NULL);
> +		detach_buf_packed_in_order(vq, i, NULL);
>  		END_USE(vq);
>  		return buf;
>  	}
> @@ -1995,6 +2326,8 @@ static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num)
>  	for (i = 0; i < num - 1; i++)
>  		desc_extra[i].next = i + 1;
>  
> +	desc_extra[num - 1].next = 0;
> +
>  	return desc_extra;
>  }
>  
> @@ -2126,8 +2459,12 @@ static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq,
>  {
>  	vq->packed = *vring_packed;
>  
> -	/* Put everything in free lists. */
> -	vq->free_head = 0;
> +	if (virtqueue_is_in_order(vq))
> +		vq->batch_head = vq->split.vring.num;
> +	else {
> +		/* Put everything in free lists. */
> +		vq->free_head = 0;
> +	}
>  }
>  
>  static void virtqueue_reset_packed(struct vring_virtqueue *vq)
> @@ -2174,13 +2511,14 @@ static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
>  #else
>  	vq->broken = false;
>  #endif
> -	vq->layout = PACKED;
>  	vq->dma_dev = dma_dev;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
>  
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>  		!context;
>  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> +	vq->layout = virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ?
> +		     PACKED_IN_ORDER : PACKED;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
>  		vq->weak_barriers = false;
> @@ -2290,9 +2628,39 @@ static const struct virtqueue_ops packed_ops = {
>  	.reset = virtqueue_reset_packed,
>  };
>  
> +static const struct virtqueue_ops split_in_order_ops = {
> +	.add = virtqueue_add_split,
> +	.get = virtqueue_get_buf_ctx_split_in_order,
> +	.kick_prepare = virtqueue_kick_prepare_split,
> +	.disable_cb = virtqueue_disable_cb_split,
> +	.enable_cb_delayed = virtqueue_enable_cb_delayed_split,
> +	.enable_cb_prepare = virtqueue_enable_cb_prepare_split,
> +	.poll = virtqueue_poll_split,
> +	.detach_unused_buf = virtqueue_detach_unused_buf_split,
> +	.more_used = more_used_split_in_order,
> +	.resize = virtqueue_resize_split,
> +	.reset = virtqueue_reset_split,
> +};
> +
> +static const struct virtqueue_ops packed_in_order_ops = {
> +	.add = virtqueue_add_packed_in_order,
> +	.get = virtqueue_get_buf_ctx_packed_in_order,
> +	.kick_prepare = virtqueue_kick_prepare_packed,
> +	.disable_cb = virtqueue_disable_cb_packed,
> +	.enable_cb_delayed = virtqueue_enable_cb_delayed_packed,
> +	.enable_cb_prepare = virtqueue_enable_cb_prepare_packed,
> +	.poll = virtqueue_poll_packed,
> +	.detach_unused_buf = virtqueue_detach_unused_buf_packed,
> +	.more_used = more_used_packed_in_order,
> +	.resize = virtqueue_resize_packed,
> +	.reset = virtqueue_reset_packed,
> +};
> +
>  static const struct virtqueue_ops *const all_ops[VQ_TYPE_MAX] = {
>  	[SPLIT] = &split_ops,
> -	[PACKED] = &packed_ops
> +	[PACKED] = &packed_ops,
> +	[SPLIT_IN_ORDER] = &split_in_order_ops,
> +	[PACKED_IN_ORDER] = &packed_in_order_ops,
>  };
>  
>  static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
> @@ -2336,7 +2704,6 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
>  /*
>   * Generic functions and exported symbols.
>   */
> -
>  #define VIRTQUEUE_CALL(vq, op, ...)					\
>  	({								\
>  	typeof(all_ops[SPLIT]->op(vq, ##__VA_ARGS__)) ret;		\
> @@ -2347,6 +2714,12 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
>  	case PACKED:							\
>  		ret = all_ops[PACKED]->op(vq, ##__VA_ARGS__);		\
>  		break;							\
> +	case SPLIT_IN_ORDER:						\
> +		ret = all_ops[SPLIT_IN_ORDER]->op(vq, ##__VA_ARGS__);	\
> +		break;							\
> +	case PACKED_IN_ORDER:						\
> +		ret = all_ops[PACKED_IN_ORDER]->op(vq, ##__VA_ARGS__);	\
> +		break;							\
>  	default:							\
>  		BUG();							\
>  		break;							\
> @@ -2363,6 +2736,12 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
>          case PACKED:					  \
>                  all_ops[PACKED]->op(vq, ##__VA_ARGS__);   \
>                  break;					  \
> +        case SPLIT_IN_ORDER:						\
> +                all_ops[SPLIT_IN_ORDER]->op(vq, ##__VA_ARGS__);		\
> +                break;							\
> +        case PACKED_IN_ORDER:						\
> +                all_ops[PACKED_IN_ORDER]->op(vq, ##__VA_ARGS__);	\
> +                break;							\
>          default:					  \
>                  BUG();					  \
>                  break;					  \
> @@ -3073,6 +3452,8 @@ void vring_transport_features(struct virtio_device *vdev)
>  			break;
>  		case VIRTIO_F_NOTIFICATION_DATA:
>  			break;
> +		case VIRTIO_F_IN_ORDER:
> +			break;
>  		default:
>  			/* We don't understand this bit. */
>  			__virtio_clear_bit(vdev, i);
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 00/19] virtio_ring in order support
  2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
                   ` (19 preceding siblings ...)
  2025-06-17 12:29 ` [PATCH V3 00/19] virtio_ring " Eugenio Perez Martin
@ 2025-07-01  6:58 ` Michael S. Tsirkin
  20 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-01  6:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Mon, Jun 16, 2025 at 04:24:58PM +0800, Jason Wang wrote:
> Hello all:
> 
> This sereis tries to implement the VIRTIO_F_IN_ORDER to
> virtio_ring. This is done by introducing virtqueue ops so we can
> implement separate helpers for different virtqueue layout/features
> then the in-order were implemented on top.
> 
> Tests shows 3%-5% imporvment with packed virtqueue PPS with KVM guest
> testpmd on the host.

Thanks very much. I think this is going in the right direction,
though some things to improve remain.
Sent comments, thanks!


> Changes since V2:
> 
> - Fix build warning when DEBUG is enabled
> 
> Changes since V1:
> 
> - use const global array of function pointers to avoid indirect
>   branches to eliminate retpoline when mitigation is enabled
> - fix used length calculation when processing used ids in a batch
> - fix sparse warnings
> 
> Please review.
> 
> Thanks
> 
> Jason Wang (19):
>   virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx()
>   virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants
>   virtio_ring: unify logic of virtqueue_poll() and more_used()
>   virtio_ring: switch to use vring_virtqueue for virtqueue resize
>     variants
>   virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare
>     variants
>   virtio_ring: switch to use vring_virtqueue for virtqueue_add variants
>   virtio: switch to use vring_virtqueue for virtqueue_add variants
>   virtio_ring: switch to use vring_virtqueue for enable_cb_prepare
>     variants
>   virtio_ring: use vring_virtqueue for enable_cb_delayed variants
>   virtio_ring: switch to use vring_virtqueue for disable_cb variants
>   virtio_ring: switch to use vring_virtqueue for detach_unused_buf
>     variants
>   virtio_ring: use u16 for last_used_idx in virtqueue_poll_split()
>   virtio_ring: introduce virtqueue ops
>   virtio_ring: determine descriptor flags at one time
>   virtio_ring: factor out core logic of buffer detaching
>   virtio_ring: factor out core logic for updating last_used_idx
>   virtio_ring: factor out split indirect detaching logic
>   virtio_ring: factor out split detaching logic
>   virtio_ring: add in order support
> 
>  drivers/virtio/virtio_ring.c | 896 ++++++++++++++++++++++++++---------
>  1 file changed, 684 insertions(+), 212 deletions(-)
> 
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time
  2025-07-01  6:42   ` Michael S. Tsirkin
@ 2025-07-01  7:25     ` Jason Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-07-01  7:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Tue, Jul 1, 2025 at 2:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 16, 2025 at 04:25:12PM +0800, Jason Wang wrote:
> > Let's determine the last descriptor by counting the number of sg. This
> > would be consistent with packed virtqueue implementation and ease the
> > future in-order implementation.
> >
> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index af32d1a1a1db..d5e4d4cd2487 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -570,7 +570,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> >       struct vring_desc_extra *extra;
> >       struct scatterlist *sg;
> >       struct vring_desc *desc;
> > -     unsigned int i, n, avail, descs_used, prev, err_idx;
> > +     unsigned int i, n, c, avail, descs_used, err_idx;
> >       int head;
> >       bool indirect;
> >
> > @@ -626,46 +626,47 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> >               return -ENOSPC;
> >       }
> >
> > +     c = 0;
>
> initialize at point of declaration?

Fine.

>
> >       for (n = 0; n < out_sgs; n++) {
> > +             sg = sgs[n];
> >               for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                       dma_addr_t addr;
> >                       u32 len;
> > +                     u16 flags = 0;
> >
> >                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr, &len, premapped))
> >                               goto unmap_release;
> >
> > -                     prev = i;
> > +                     if (++c != total_sg)
> > +                             flags = VRING_DESC_F_NEXT;
> > +
>
> Don't like it how the logic is split.
> flags isn't used before that.
> So I prefer:
>
>         flags = ++c == total_sg ? 0 : VRING_DESC_F_NEXT;
>
> and at that point, we do not really need flags anymore:
>
>
> >                       /* Note that we trust indirect descriptor
> >                        * table since it use stream DMA mapping.
> >                        */
> >                       i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
> > -                                                  VRING_DESC_F_NEXT,
> > +                                                  flags,
>
> So just:
>
>
>         i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
>                                      ++c == total_sg ? 0 : VRING_DESC_F_NEXT,
>
> here and we are done.
>
>
> >                                                    premapped);
> >               }
> >       }
> >       for (; n < (out_sgs + in_sgs); n++) {
> >               for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > +                     u16 flags = VRING_DESC_F_WRITE;
> >                       dma_addr_t addr;
> >                       u32 len;
> >
> >                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr, &len, premapped))
> >                               goto unmap_release;
> >
> > -                     prev = i;
> > +                     if (++c != total_sg)
> > +                             flags |= VRING_DESC_F_NEXT;
> > +
>
> Don't like it that above it's "=" here it is "|=".
> And flags isn't used before that.
> So I prefer:
>
>         flags = ++c == total_sg ? VRING_DESC_F_WRITE : VRING_DESC_F_WRITE | VRING_DESC_F_NEXT;
>
> and again we don't really need the variable:
>
>
> >                       /* Note that we trust indirect descriptor
> >                        * table since it use stream DMA mapping.
> >                        */
> >                       i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
> > -                                                  VRING_DESC_F_NEXT |
> > -                                                  VRING_DESC_F_WRITE,
>
>
> so just:
>
>                 i = virtqueue_add_desc_split(vq, desc, extra, i, addr, len,
>                                              (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
>                                              VRING_DESC_F_WRITE,
>
>
> is clearer, and the patch smaller.

Right, I just copy the logic of packed.

Let me do that for both split and packed for the next version.

Thanks

>
>
>
> > -                                                  premapped);
> > +                                                  flags, premapped);
> >               }
> >       }
> > -     /* Last one doesn't continue. */
> > -     desc[prev].flags &= cpu_to_virtio16(vq->vq.vdev, ~VRING_DESC_F_NEXT);
> > -     if (!indirect && vring_need_unmap_buffer(vq, &extra[prev]))
> > -             vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> > -                     ~VRING_DESC_F_NEXT;
> >
> >       if (indirect) {
> >               /* Now that the indirect table is filled in, map it. */
> > --
> > 2.34.1
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 19/19] virtio_ring: add in order support
  2025-07-01  6:56   ` Michael S. Tsirkin
@ 2025-07-02  9:29     ` Jason Wang
  2025-07-02 10:56       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-07-02  9:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Tue, Jul 1, 2025 at 2:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 16, 2025 at 04:25:17PM +0800, Jason Wang wrote:
> > This patch implements in order support for both split virtqueue and
> > packed virtqueue.
>
> I'd like to see more motivation for this work, documented.
> It's not really performance, not as it stands, see below:
>
> >
> > Benchmark with KVM guest + testpmd on the host shows:
> >
> > For split virtqueue: no obvious differences were noticed
> >
> > For packed virtqueue:
> >
> > 1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
> > 2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps
> >
>
> That's a very modest improvement for a lot of code.
> I also note you put in some batching just for in-order.
> Which could also explain the gains maybe?
> What if you just put in a simple implementation with no
> batching tricks? do you still see a gain?

It is used to implement the batch used updating.

"""
Some devices always use descriptors in the same order in which they
have been made available. These devices can offer the
VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
devices to notify the use of a batch of buffers to the driver by only
writing out a single used ring entry with the id corresponding to the
head entry of the descriptor chain describing the last buffer in the
batch.
"""

DPDK implements this behavior, so it's a must for the virtio driver.

> Does any hardware implement this? Maybe that can demonstrate
> bigger gains.

Maybe but I don't have one in my hand.

For performance, I think it should be sufficient as a starter. I can
say in the next version something like "more optimizations could be
done on top"

Note that the patch that introduces packed virtqueue, there's not even
any numbers:

commit 1ce9e6055fa0a9043405c5604cf19169ec5379ff
Author: Tiwei Bie <tiwei.bie@intel.com>
Date:   Wed Nov 21 18:03:27 2018 +0800

    virtio_ring: introduce packed ring support

    Introduce the packed ring support. Packed ring can only be
    created by vring_create_virtqueue() and each chunk of packed
    ring will be allocated individually. Packed ring can not be
    created on preallocated memory by vring_new_virtqueue() or
    the likes currently.

    Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

>
>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
> >  1 file changed, 402 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 27a9459a0555..21d456392ba0 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -70,11 +70,14 @@
> >  enum vq_layout {
> >       SPLIT = 0,
> >       PACKED,
> > +     SPLIT_IN_ORDER,
> > +     PACKED_IN_ORDER,
> >       VQ_TYPE_MAX,
> >  };
> >
> >  struct vring_desc_state_split {
> >       void *data;                     /* Data for callback. */
> > +     u32 total_len;                  /* Buffer Length */
> >
> >       /* Indirect desc table and extra table, if any. These two will be
> >        * allocated together. So we won't stress more to the memory allocator.
> > @@ -84,6 +87,7 @@ struct vring_desc_state_split {
> >
> >  struct vring_desc_state_packed {
> >       void *data;                     /* Data for callback. */
> > +     u32 total_len;                  /* Buffer Length */
> >
> >       /* Indirect desc table and extra table, if any. These two will be
> >        * allocated together. So we won't stress more to the memory allocator.
>
> We are bloating up the cache footprint for everyone,
> so there's a chance of regressions.
> Pls include benchmark for in order off, to make sure we
> are not regressing.

Ok.

> How big was the ring?

256.

> Worth trying with a biggish one, where there is more cache
> pressure.

Ok.

>
>
> Why not have a separate state for in-order?

It can work.

>
>
>
> > @@ -206,6 +210,12 @@ struct vring_virtqueue {
> >
> >       /* Head of free buffer list. */
> >       unsigned int free_head;
> > +
> > +     /* Head of the batched used buffers, vq->num means no batching */
> > +     unsigned int batch_head;
> > +
> > +     unsigned int batch_len;
> > +
>
> Are these two only used for in-order? Please document that.

Yes, I will do that.

> I also want some documentation about the batching trickery
> used please.
> What is batched, when, how is batching flushed, why are we
> only batching in-order ...

I'm not sure I get things like this, what you want seems to be the
behaviour of the device which has been stated by the spec or I may
miss something here.

>
>
>
>
> >       /* Number we've added since last sync. */
> >       unsigned int num_added;
> >
> > @@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
> >
> >  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> >
> > -
> >  static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> >  {
> > -     return vq->layout == PACKED;
> > +     return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
> > +}
> > +
> > +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> > +{
> > +     return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
> >  }
> >
> >  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> > @@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> >       struct vring_desc_extra *extra;
> >       struct scatterlist *sg;
> >       struct vring_desc *desc;
> > -     unsigned int i, n, c, avail, descs_used, err_idx;
> > +     unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;
>
>
> I would add a comment here:
>
> /* Total length for in-order */
> unsigned int total_len = 0;

Ok.

Thanks


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 19/19] virtio_ring: add in order support
  2025-07-02  9:29     ` Jason Wang
@ 2025-07-02 10:56       ` Michael S. Tsirkin
  2025-07-03  3:13         ` Jason Wang
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2025-07-02 10:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Wed, Jul 02, 2025 at 05:29:18PM +0800, Jason Wang wrote:
> On Tue, Jul 1, 2025 at 2:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 16, 2025 at 04:25:17PM +0800, Jason Wang wrote:
> > > This patch implements in order support for both split virtqueue and
> > > packed virtqueue.
> >
> > I'd like to see more motivation for this work, documented.
> > It's not really performance, not as it stands, see below:
> >
> > >
> > > Benchmark with KVM guest + testpmd on the host shows:
> > >
> > > For split virtqueue: no obvious differences were noticed
> > >
> > > For packed virtqueue:
> > >
> > > 1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
> > > 2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps
> > >
> >
> > That's a very modest improvement for a lot of code.
> > I also note you put in some batching just for in-order.
> > Which could also explain the gains maybe?
> > What if you just put in a simple implementation with no
> > batching tricks? do you still see a gain?
> 
> It is used to implement the batch used updating.
> 
> """
> Some devices always use descriptors in the same order in which they
> have been made available. These devices can offer the
> VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> devices to notify the use of a batch of buffers to the driver by only
> writing out a single used ring entry with the id corresponding to the
> head entry of the descriptor chain describing the last buffer in the
> batch.
> """
> 
> DPDK implements this behavior, so it's a must for the virtio driver.
> 
> > Does any hardware implement this? Maybe that can demonstrate
> > bigger gains.
> 
> Maybe but I don't have one in my hand.
> 
> For performance, I think it should be sufficient as a starter. I can
> say in the next version something like "more optimizations could be
> done on top"

What are some optimizations you have in mind?



> Note that the patch that introduces packed virtqueue, there's not even
> any numbers:
> 
> commit 1ce9e6055fa0a9043405c5604cf19169ec5379ff
> Author: Tiwei Bie <tiwei.bie@intel.com>
> Date:   Wed Nov 21 18:03:27 2018 +0800
> 
>     virtio_ring: introduce packed ring support
> 
>     Introduce the packed ring support. Packed ring can only be
>     created by vring_create_virtqueue() and each chunk of packed
>     ring will be allocated individually. Packed ring can not be
>     created on preallocated memory by vring_new_virtqueue() or
>     the likes currently.
> 
>     Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>


I think the assumption there was that intel has hardware that
requires packed. That's why Dave merged this.

> >
> >
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
> > >  1 file changed, 402 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 27a9459a0555..21d456392ba0 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -70,11 +70,14 @@
> > >  enum vq_layout {
> > >       SPLIT = 0,
> > >       PACKED,
> > > +     SPLIT_IN_ORDER,
> > > +     PACKED_IN_ORDER,
> > >       VQ_TYPE_MAX,
> > >  };
> > >
> > >  struct vring_desc_state_split {
> > >       void *data;                     /* Data for callback. */
> > > +     u32 total_len;                  /* Buffer Length */
> > >
> > >       /* Indirect desc table and extra table, if any. These two will be
> > >        * allocated together. So we won't stress more to the memory allocator.
> > > @@ -84,6 +87,7 @@ struct vring_desc_state_split {
> > >
> > >  struct vring_desc_state_packed {
> > >       void *data;                     /* Data for callback. */
> > > +     u32 total_len;                  /* Buffer Length */
> > >
> > >       /* Indirect desc table and extra table, if any. These two will be
> > >        * allocated together. So we won't stress more to the memory allocator.
> >
> > We are bloating up the cache footprint for everyone,
> > so there's a chance of regressions.
> > Pls include benchmark for in order off, to make sure we
> > are not regressing.
> 
> Ok.
> 
> > How big was the ring?
> 
> 256.

that is very modest, you want to fill at least one cache way,
preferably more.

> > Worth trying with a biggish one, where there is more cache
> > pressure.
> 
> Ok.
> 
> >
> >
> > Why not have a separate state for in-order?
> 
> It can work.
> 
> >
> >
> >
> > > @@ -206,6 +210,12 @@ struct vring_virtqueue {
> > >
> > >       /* Head of free buffer list. */
> > >       unsigned int free_head;
> > > +
> > > +     /* Head of the batched used buffers, vq->num means no batching */
> > > +     unsigned int batch_head;
> > > +
> > > +     unsigned int batch_len;
> > > +
> >
> > Are these two only used for in-order? Please document that.
> 
> Yes, I will do that.
> 
> > I also want some documentation about the batching trickery
> > used please.
> > What is batched, when, how is batching flushed, why are we
> > only batching in-order ...
> 
> I'm not sure I get things like this, what you want seems to be the
> behaviour of the device which has been stated by the spec or I may
> miss something here.

"a single used ring entry with the id corresponding to the
 head entry of the descriptor chain describing the last buffer in the
 batch"
?

so together they form this used ring entry describing the last buffer?
"head" is the id and "len" the length?

maybe

	/*
	 * With IN_ORDER, devices write a single used ring entry with
	 * the id corresponding to the head entry of the descriptor chain
	 * describing the last buffer in the batch
	 */
	struct used_entry {
		u32 id;
		u32 len;
	} batch_last;

?




> >
> >
> >
> >
> > >       /* Number we've added since last sync. */
> > >       unsigned int num_added;
> > >
> > > @@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
> > >
> > >  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> > >
> > > -
> > >  static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> > >  {
> > > -     return vq->layout == PACKED;
> > > +     return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
> > > +}
> > > +
> > > +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> > > +{
> > > +     return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
> > >  }
> > >
> > >  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> > > @@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> > >       struct vring_desc_extra *extra;
> > >       struct scatterlist *sg;
> > >       struct vring_desc *desc;
> > > -     unsigned int i, n, c, avail, descs_used, err_idx;
> > > +     unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;
> >
> >
> > I would add a comment here:
> >
> > /* Total length for in-order */
> > unsigned int total_len = 0;
> 
> Ok.
> 
> Thanks


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 19/19] virtio_ring: add in order support
  2025-07-02 10:56       ` Michael S. Tsirkin
@ 2025-07-03  3:13         ` Jason Wang
  2025-07-03 13:49           ` Jonah Palmer
  0 siblings, 1 reply; 35+ messages in thread
From: Jason Wang @ 2025-07-03  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: xuanzhuo, eperezma, virtualization, linux-kernel, Jonah Palmer

On Wed, Jul 2, 2025 at 6:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 02, 2025 at 05:29:18PM +0800, Jason Wang wrote:
> > On Tue, Jul 1, 2025 at 2:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 16, 2025 at 04:25:17PM +0800, Jason Wang wrote:
> > > > This patch implements in order support for both split virtqueue and
> > > > packed virtqueue.
> > >
> > > I'd like to see more motivation for this work, documented.
> > > It's not really performance, not as it stands, see below:
> > >
> > > >
> > > > Benchmark with KVM guest + testpmd on the host shows:
> > > >
> > > > For split virtqueue: no obvious differences were noticed
> > > >
> > > > For packed virtqueue:
> > > >
> > > > 1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
> > > > 2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps
> > > >
> > >
> > > That's a very modest improvement for a lot of code.
> > > I also note you put in some batching just for in-order.
> > > Which could also explain the gains maybe?
> > > What if you just put in a simple implementation with no
> > > batching tricks? do you still see a gain?
> >
> > It is used to implement the batch used updating.
> >
> > """
> > Some devices always use descriptors in the same order in which they
> > have been made available. These devices can offer the
> > VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> > devices to notify the use of a batch of buffers to the driver by only
> > writing out a single used ring entry with the id corresponding to the
> > head entry of the descriptor chain describing the last buffer in the
> > batch.
> > """
> >
> > DPDK implements this behavior, so it's a must for the virtio driver.
> >
> > > Does any hardware implement this? Maybe that can demonstrate
> > > bigger gains.
> >
> > Maybe but I don't have one in my hand.
> >
> > For performance, I think it should be sufficient as a starter. I can
> > say in the next version something like "more optimizations could be
> > done on top"
>
> What are some optimizations you have in mind?

One thing in my mind, spec currently said:

"""
 If negotiated, this knowledge allows devices to notify the use of a
batch of buffers to the driver by only writing out a single used
descriptor with the Buffer ID corresponding to the last descriptor in
the batch.
"""

If the device writes the last descriptor ID instead of the buffer ID
and skip the number of descriptors in the used ring. For split
virtqueue, the avail ring is not needed anymore. Device knows the
availability of buffers via avail_idx. In this way, we completely
eliminate the access of the available ring. This reduces the memory
access which is expensive for both:

1) kernel vhost-net where small user space memory access is expensive
2) hardware PCI transactions

Does this make sense?

>
>
>
> > Note that the patch that introduces packed virtqueue, there's not even
> > any numbers:
> >
> > commit 1ce9e6055fa0a9043405c5604cf19169ec5379ff
> > Author: Tiwei Bie <tiwei.bie@intel.com>
> > Date:   Wed Nov 21 18:03:27 2018 +0800
> >
> >     virtio_ring: introduce packed ring support
> >
> >     Introduce the packed ring support. Packed ring can only be
> >     created by vring_create_virtqueue() and each chunk of packed
> >     ring will be allocated individually. Packed ring can not be
> >     created on preallocated memory by vring_new_virtqueue() or
> >     the likes currently.
> >
> >     Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> I think the assumption there was that intel has hardware that
> requires packed. That's why Dave merged this.

I think it should according to Qemu patch:

commit c03213fdc9d7b680cc575cd1e725750702a10b09
Author: Jonah Palmer <jonah.palmer@oracle.com>
Date:   Wed Jul 10 08:55:18 2024 -0400

    vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

    Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
    devices.

    The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
    devices ensures that the backend is capable of offering and providing
    support for this feature, and that it can be disabled if the backend
    does not support it.

    Acked-by: Eugenio Pérez <eperezma@redhat.com>
    Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
    Message-Id: <20240710125522.4168043-6-jonah.palmer@oracle.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Adding Jonah for more thought here.

>
> > >
> > >
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 402 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 27a9459a0555..21d456392ba0 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -70,11 +70,14 @@
> > > >  enum vq_layout {
> > > >       SPLIT = 0,
> > > >       PACKED,
> > > > +     SPLIT_IN_ORDER,
> > > > +     PACKED_IN_ORDER,
> > > >       VQ_TYPE_MAX,
> > > >  };
> > > >
> > > >  struct vring_desc_state_split {
> > > >       void *data;                     /* Data for callback. */
> > > > +     u32 total_len;                  /* Buffer Length */
> > > >
> > > >       /* Indirect desc table and extra table, if any. These two will be
> > > >        * allocated together. So we won't stress more to the memory allocator.
> > > > @@ -84,6 +87,7 @@ struct vring_desc_state_split {
> > > >
> > > >  struct vring_desc_state_packed {
> > > >       void *data;                     /* Data for callback. */
> > > > +     u32 total_len;                  /* Buffer Length */
> > > >
> > > >       /* Indirect desc table and extra table, if any. These two will be
> > > >        * allocated together. So we won't stress more to the memory allocator.
> > >
> > > We are bloating up the cache footprint for everyone,
> > > so there's a chance of regressions.
> > > Pls include benchmark for in order off, to make sure we
> > > are not regressing.
> >
> > Ok.
> >
> > > How big was the ring?
> >
> > 256.
>
> that is very modest, you want to fill at least one cache way,
> preferably more.

I can test larger queue sizes.

>
> > > Worth trying with a biggish one, where there is more cache
> > > pressure.
> >
> > Ok.
> >
> > >
> > >
> > > Why not have a separate state for in-order?
> >
> > It can work.
> >
> > >
> > >
> > >
> > > > @@ -206,6 +210,12 @@ struct vring_virtqueue {
> > > >
> > > >       /* Head of free buffer list. */
> > > >       unsigned int free_head;
> > > > +
> > > > +     /* Head of the batched used buffers, vq->num means no batching */
> > > > +     unsigned int batch_head;
> > > > +
> > > > +     unsigned int batch_len;
> > > > +
> > >
> > > Are these two only used for in-order? Please document that.
> >
> > Yes, I will do that.
> >
> > > I also want some documentation about the batching trickery
> > > used please.
> > > What is batched, when, how is batching flushed, why are we
> > > only batching in-order ...
> >
> > I'm not sure I get things like this, what you want seems to be the
> > behaviour of the device which has been stated by the spec or I may
> > miss something here.
>
> "a single used ring entry with the id corresponding to the
>  head entry of the descriptor chain describing the last buffer in the
>  batch"
> ?

Exactly.

>
> so together they form this used ring entry describing the last buffer?
> "head" is the id and "len" the length?

Yes.

>
> maybe
>
>         /*
>          * With IN_ORDER, devices write a single used ring entry with
>          * the id corresponding to the head entry of the descriptor chain
>          * describing the last buffer in the batch
>          */
>         struct used_entry {
>                 u32 id;
>                 u32 len;
>         } batch_last;
>
> ?

This should be fine.

>
>
>
>
> > >
> > >
> > >
> > >
> > > >       /* Number we've added since last sync. */
> > > >       unsigned int num_added;
> > > >
> > > > @@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
> > > >
> > > >  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> > > >
> > > > -
> > > >  static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> > > >  {
> > > > -     return vq->layout == PACKED;
> > > > +     return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
> > > > +}
> > > > +
> > > > +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> > > > +{
> > > > +     return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
> > > >  }
> > > >
> > > >  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> > > > @@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> > > >       struct vring_desc_extra *extra;
> > > >       struct scatterlist *sg;
> > > >       struct vring_desc *desc;
> > > > -     unsigned int i, n, c, avail, descs_used, err_idx;
> > > > +     unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;
> > >
> > >
> > > I would add a comment here:
> > >
> > > /* Total length for in-order */
> > > unsigned int total_len = 0;
> >
> > Ok.
> >
> > Thanks
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 13/19] virtio_ring: introduce virtqueue ops
  2025-07-01  6:28   ` Michael S. Tsirkin
@ 2025-07-03  3:20     ` Jason Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-07-03  3:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: xuanzhuo, eperezma, virtualization, linux-kernel

On Tue, Jul 1, 2025 at 2:29 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 16, 2025 at 04:25:11PM +0800, Jason Wang wrote:
> > This patch introduces virtqueue ops which is a set of the callbacks
> > that will be called for different queue layout or features. This would
> > help to avoid branches for split/packed and will ease the future
> > implementation like in order.
> >
> > Note that in order to eliminate the indirect calls this patch uses
> > global array of const ops to allow compiler to avoid indirect
> > branches.
> >
> > Tested with CONFIG_MITIGATION_RETPOLINE, no performance differences
> > were noticed.
> >
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 172 ++++++++++++++++++++++++++---------
> >  1 file changed, 129 insertions(+), 43 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index b14bbb4d6713..af32d1a1a1db 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -67,6 +67,12 @@
> >  #define LAST_ADD_TIME_INVALID(vq)
> >  #endif
> >
> > +enum vq_layout {
> > +     SPLIT = 0,
> > +     PACKED,
> > +     VQ_TYPE_MAX,
> > +};
> > +
> >  struct vring_desc_state_split {
> >       void *data;                     /* Data for callback. */
> >
> > @@ -159,12 +165,28 @@ struct vring_virtqueue_packed {
> >       size_t event_size_in_bytes;
> >  };
> >
> > +struct vring_virtqueue;
> > +
> > +struct virtqueue_ops {
> > +     int (*add)(struct vring_virtqueue *_vq, struct scatterlist *sgs[],
> > +                unsigned int total_sg, unsigned int out_sgs,
> > +                unsigned int in_sgs, void *data,
> > +                void *ctx, bool premapped, gfp_t gfp);
> > +     void *(*get)(struct vring_virtqueue *vq, unsigned int *len, void **ctx);
> > +     bool (*kick_prepare)(struct vring_virtqueue *vq);
> > +     void (*disable_cb)(struct vring_virtqueue *vq);
> > +     bool (*enable_cb_delayed)(struct vring_virtqueue *vq);
> > +     unsigned int (*enable_cb_prepare)(struct vring_virtqueue *vq);
> > +     bool (*poll)(const struct vring_virtqueue *vq, u16 last_used_idx);
> > +     void *(*detach_unused_buf)(struct vring_virtqueue *vq);
> > +     bool (*more_used)(const struct vring_virtqueue *vq);
> > +     int (*resize)(struct vring_virtqueue *vq, u32 num);
> > +     void (*reset)(struct vring_virtqueue *vq);
> > +};
> > +
> >  struct vring_virtqueue {
> >       struct virtqueue vq;
> >
> > -     /* Is this a packed ring? */
> > -     bool packed_ring;
> > -
> >       /* Is DMA API used? */
> >       bool use_dma_api;
> >
> > @@ -180,6 +202,8 @@ struct vring_virtqueue {
> >       /* Host publishes avail event idx */
> >       bool event;
> >
> > +     enum vq_layout layout;
> > +
> >       /* Head of free buffer list. */
> >       unsigned int free_head;
> >       /* Number we've added since last sync. */
> > @@ -232,6 +256,12 @@ static void vring_free(struct virtqueue *_vq);
> >
> >  #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> >
> > +
>
> Why two empty lines?

Let me fix that.

>
> > +static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> > +{
> > +     return vq->layout == PACKED;
> > +}
> > +
>
> So why still have this?
> Let's replace all uses with a per layout
> callback. None of them seem to be on the data path.
> Well, with one exception - there is

I actually start with that but the problem is a lot of codes could be
duplicated that needs more refactoring which will end up with a pretty
giant series:

+static const struct virtqueue_ops split_in_order_ops = {
+       .add = virtqueue_add_split,
+       .get = virtqueue_get_buf_ctx_split_in_order,
+       .kick_prepare = virtqueue_kick_prepare_split,
+       .disable_cb = virtqueue_disable_cb_split,
+       .enable_cb_delayed = virtqueue_enable_cb_delayed_split,
+       .enable_cb_prepare = virtqueue_enable_cb_prepare_split,
+       .poll = virtqueue_poll_split,
+       .detach_unused_buf = virtqueue_detach_unused_buf_split,
+       .more_used = more_used_split_in_order,
+       .resize = virtqueue_resize_split,
+       .reset = virtqueue_reset_split,
+};

So I start with the function that has sufficient difference:

virtqueue_get_buf_ctx_split_in_order
more_used_split_in_order

And leave the rest for the future if we can see performance improvement.


>         > @@ -2921,7 +3006,7 @@ u32 vring_notification_data(struct virtqueue *_vq)
>         >       struct vring_virtqueue *vq = to_vvq(_vq);
>         >       u16 next;
>         >
>         > -     if (vq->packed_ring)
>         > +     if (virtqueue_is_packed(vq))
>         >               next = (vq->packed.next_avail_idx &
>         >                               ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))) |
>         >                       vq->packed.avail_wrap_counter <<
>
> I think it's the only one? Are we trying to save a function call there?
> I am fine to keep it as in this patch, or if we do change it,
> maybe with a separate patch, so if there is a regression we can bisect
> more easily.
>

Yes.

>
>
>
> For example, for the chunk below, we could have:
>         .init_last_used
>
> having said that, this patchset is already big, so let us do it with a separate patch -
> but if so, the way to split would be in patch 1 to just leave vq->packed_ring
> in place, gradually replace call sites, and in patch N drop
> vq->packed_ring.

Ok, let me do that.

>
>
> >  static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> >                                  unsigned int total_sg)
> >  {
> > @@ -422,7 +452,7 @@ static void virtqueue_init(struct vring_virtqueue *vq, u32 num)
> >  {
> >       vq->vq.num_free = num;
> >
> > -     if (vq->packed_ring)
> > +     if (virtqueue_is_packed(vq))
> >               vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR);
> >       else
> >               vq->last_used_idx = 0;
> > @@ -1116,6 +1146,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
> >       return 0;
> >  }
> >
> > +static const struct virtqueue_ops split_ops;
> > +
> >  static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
> >                                              struct vring_virtqueue_split *vring_split,
> >                                              struct virtio_device *vdev,
> > @@ -1133,7 +1165,7 @@ static struct virtqueue *__vring_new_virtqueue_split(unsigned int index,
> >       if (!vq)
> >               return NULL;
> >
> > -     vq->packed_ring = false;
> > +     vq->layout = SPLIT;
> >       vq->vq.callback = callback;
> >       vq->vq.vdev = vdev;
> >       vq->vq.name = name;
> > @@ -2076,6 +2108,8 @@ static void virtqueue_reset_packed(struct vring_virtqueue *vq)
> >       virtqueue_vring_init_packed(&vq->packed, !!vq->vq.callback);
> >  }
> >
> > +static const struct virtqueue_ops packed_ops;
> > +
> >  static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
> >                                              struct vring_virtqueue_packed *vring_packed,
> >                                              struct virtio_device *vdev,
> > @@ -2106,7 +2140,7 @@ static struct virtqueue *__vring_new_virtqueue_packed(unsigned int index,
> >  #else
> >       vq->broken = false;
> >  #endif
> > -     vq->packed_ring = true;
> > +     vq->layout = PACKED;
> >       vq->dma_dev = dma_dev;
> >       vq->use_dma_api = vring_use_dma_api(vdev);
> >
> > @@ -2194,6 +2228,39 @@ static int virtqueue_resize_packed(struct vring_virtqueue *vq, u32 num)
> >       return -ENOMEM;
> >  }
> >
> > +static const struct virtqueue_ops split_ops = {
> > +     .add = virtqueue_add_split,
> > +     .get = virtqueue_get_buf_ctx_split,
> > +     .kick_prepare = virtqueue_kick_prepare_split,
> > +     .disable_cb = virtqueue_disable_cb_split,
> > +     .enable_cb_delayed = virtqueue_enable_cb_delayed_split,
> > +     .enable_cb_prepare = virtqueue_enable_cb_prepare_split,
> > +     .poll = virtqueue_poll_split,
> > +     .detach_unused_buf = virtqueue_detach_unused_buf_split,
> > +     .more_used = more_used_split,
> > +     .resize = virtqueue_resize_split,
> > +     .reset = virtqueue_reset_split,
> > +};
> > +
> > +static const struct virtqueue_ops packed_ops = {
> > +     .add = virtqueue_add_packed,
> > +     .get = virtqueue_get_buf_ctx_packed,
> > +     .kick_prepare = virtqueue_kick_prepare_packed,
> > +     .disable_cb = virtqueue_disable_cb_packed,
> > +     .enable_cb_delayed = virtqueue_enable_cb_delayed_packed,
> > +     .enable_cb_prepare = virtqueue_enable_cb_prepare_packed,
> > +     .poll = virtqueue_poll_packed,
> > +     .detach_unused_buf = virtqueue_detach_unused_buf_packed,
> > +     .more_used = more_used_packed,
> > +     .resize = virtqueue_resize_packed,
> > +     .reset = virtqueue_reset_packed,
> > +};
> > +
> > +static const struct virtqueue_ops *const all_ops[VQ_TYPE_MAX] = {
> > +     [SPLIT] = &split_ops,
> > +     [PACKED] = &packed_ops
> > +};
> > +
> >  static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
> >                                        void (*recycle)(struct virtqueue *vq, void *buf))
> >  {
> > @@ -2236,6 +2303,38 @@ static int virtqueue_enable_after_reset(struct virtqueue *_vq)
> >   * Generic functions and exported symbols.
> >   */
> >
> > +#define VIRTQUEUE_CALL(vq, op, ...)                                  \
> > +     ({                                                              \
> > +     typeof(all_ops[SPLIT]->op(vq, ##__VA_ARGS__)) ret;              \
>
> we need an empty line here, after the declaration.

Ok.

Thanks


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 19/19] virtio_ring: add in order support
  2025-07-03  3:13         ` Jason Wang
@ 2025-07-03 13:49           ` Jonah Palmer
  2025-07-07  3:28             ` Jason Wang
  0 siblings, 1 reply; 35+ messages in thread
From: Jonah Palmer @ 2025-07-03 13:49 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: xuanzhuo, eperezma, virtualization, linux-kernel



On 7/2/25 11:13 PM, Jason Wang wrote:
> On Wed, Jul 2, 2025 at 6:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Wed, Jul 02, 2025 at 05:29:18PM +0800, Jason Wang wrote:
>>> On Tue, Jul 1, 2025 at 2:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>
>>>> On Mon, Jun 16, 2025 at 04:25:17PM +0800, Jason Wang wrote:
>>>>> This patch implements in order support for both split virtqueue and
>>>>> packed virtqueue.
>>>>
>>>> I'd like to see more motivation for this work, documented.
>>>> It's not really performance, not as it stands, see below:
>>>>
>>>>>
>>>>> Benchmark with KVM guest + testpmd on the host shows:
>>>>>
>>>>> For split virtqueue: no obvious differences were noticed
>>>>>
>>>>> For packed virtqueue:
>>>>>
>>>>> 1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
>>>>> 2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps
>>>>>
>>>>
>>>> That's a very modest improvement for a lot of code.
>>>> I also note you put in some batching just for in-order.
>>>> Which could also explain the gains maybe?
>>>> What if you just put in a simple implementation with no
>>>> batching tricks? do you still see a gain?
>>>
>>> It is used to implement the batch used updating.
>>>
>>> """
>>> Some devices always use descriptors in the same order in which they
>>> have been made available. These devices can offer the
>>> VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
>>> devices to notify the use of a batch of buffers to the driver by only
>>> writing out a single used ring entry with the id corresponding to the
>>> head entry of the descriptor chain describing the last buffer in the
>>> batch.
>>> """
>>>
>>> DPDK implements this behavior, so it's a must for the virtio driver.
>>>
>>>> Does any hardware implement this? Maybe that can demonstrate
>>>> bigger gains.
>>>
>>> Maybe but I don't have one in my hand.
>>>
>>> For performance, I think it should be sufficient as a starter. I can
>>> say in the next version something like "more optimizations could be
>>> done on top"
>>
>> What are some optimizations you have in mind?
> 
> One thing in my mind, spec currently said:
> 
> """
>   If negotiated, this knowledge allows devices to notify the use of a
> batch of buffers to the driver by only writing out a single used
> descriptor with the Buffer ID corresponding to the last descriptor in
> the batch.
> """
> 
> If the device writes the last descriptor ID instead of the buffer ID
> and skip the number of descriptors in the used ring. For split
> virtqueue, the avail ring is not needed anymore. Device knows the
> availability of buffers via avail_idx. In this way, we completely
> eliminate the access of the available ring. This reduces the memory
> access which is expensive for both:
> 
> 1) kernel vhost-net where small user space memory access is expensive
> 2) hardware PCI transactions
> 
> Does this make sense?
> 
>>
>>
>>
>>> Note that the patch that introduces packed virtqueue, there's not even
>>> any numbers:
>>>
>>> commit 1ce9e6055fa0a9043405c5604cf19169ec5379ff
>>> Author: Tiwei Bie <tiwei.bie@intel.com>
>>> Date:   Wed Nov 21 18:03:27 2018 +0800
>>>
>>>      virtio_ring: introduce packed ring support
>>>
>>>      Introduce the packed ring support. Packed ring can only be
>>>      created by vring_create_virtqueue() and each chunk of packed
>>>      ring will be allocated individually. Packed ring can not be
>>>      created on preallocated memory by vring_new_virtqueue() or
>>>      the likes currently.
>>>
>>>      Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>>
>> I think the assumption there was that intel has hardware that
>> requires packed. That's why Dave merged this.
> 
> I think it should according to Qemu patch:
> 
> commit c03213fdc9d7b680cc575cd1e725750702a10b09
> Author: Jonah Palmer <jonah.palmer@oracle.com>
> Date:   Wed Jul 10 08:55:18 2024 -0400
> 
>      vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
> 
>      Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
>      devices.
> 
>      The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
>      devices ensures that the backend is capable of offering and providing
>      support for this feature, and that it can be disabled if the backend
>      does not support it.
> 
>      Acked-by: Eugenio Pérez <eperezma@redhat.com>
>      Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>      Message-Id: <20240710125522.4168043-6-jonah.palmer@oracle.com>
>      Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>      Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Adding Jonah for more thought here.
> 

Hi. By "it should", are you referring to Intel having hardware requiring 
this feature? Sorry, just having a bit of trouble following.

In any case, this looks like a good first implementation that can be 
used as a foundation for future implementations to further improve its 
performance.

This was the thought process I had when I implemented this feature in 
Qemu. That is, get a solid framework down that supports this feature 
(which had some small, modest performance improvements for some devices) 
and then add on future patches (perhaps device-specific and/or general 
implementations) that take advantage of the fact that data follows this 
FIFO model.

As Jason mentioned, one such future implementation could remove the need 
for the use of the avail ring in the split VQ case since the device 
wouldn't need to read it to learn which descriptor comes next.

Another example could be with vhost-net / vhost-vdpa. Currently each 
queue tracks 3 separate indices and keeps a per-descriptor bookkeeping 
table to handle buffers completing out of order. If the backend knows 
data is FIFO, we might be able to drop these trackers and just use a 
head and tail counter with a single contiguous iovec ring. This could 
result in a smaller cache footprint and fewer DMAs.

Jonah

>>
>>>>
>>>>
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> ---
>>>>>   drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
>>>>>   1 file changed, 402 insertions(+), 21 deletions(-)
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>>> index 27a9459a0555..21d456392ba0 100644
>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>> @@ -70,11 +70,14 @@
>>>>>   enum vq_layout {
>>>>>        SPLIT = 0,
>>>>>        PACKED,
>>>>> +     SPLIT_IN_ORDER,
>>>>> +     PACKED_IN_ORDER,
>>>>>        VQ_TYPE_MAX,
>>>>>   };
>>>>>
>>>>>   struct vring_desc_state_split {
>>>>>        void *data;                     /* Data for callback. */
>>>>> +     u32 total_len;                  /* Buffer Length */
>>>>>
>>>>>        /* Indirect desc table and extra table, if any. These two will be
>>>>>         * allocated together. So we won't stress more to the memory allocator.
>>>>> @@ -84,6 +87,7 @@ struct vring_desc_state_split {
>>>>>
>>>>>   struct vring_desc_state_packed {
>>>>>        void *data;                     /* Data for callback. */
>>>>> +     u32 total_len;                  /* Buffer Length */
>>>>>
>>>>>        /* Indirect desc table and extra table, if any. These two will be
>>>>>         * allocated together. So we won't stress more to the memory allocator.
>>>>
>>>> We are bloating up the cache footprint for everyone,
>>>> so there's a chance of regressions.
>>>> Pls include benchmark for in order off, to make sure we
>>>> are not regressing.
>>>
>>> Ok.
>>>
>>>> How big was the ring?
>>>
>>> 256.
>>
>> that is very modest, you want to fill at least one cache way,
>> preferably more.
> 
> I can test larger queue sizes.
> 
>>
>>>> Worth trying with a biggish one, where there is more cache
>>>> pressure.
>>>
>>> Ok.
>>>
>>>>
>>>>
>>>> Why not have a separate state for in-order?
>>>
>>> It can work.
>>>
>>>>
>>>>
>>>>
>>>>> @@ -206,6 +210,12 @@ struct vring_virtqueue {
>>>>>
>>>>>        /* Head of free buffer list. */
>>>>>        unsigned int free_head;
>>>>> +
>>>>> +     /* Head of the batched used buffers, vq->num means no batching */
>>>>> +     unsigned int batch_head;
>>>>> +
>>>>> +     unsigned int batch_len;
>>>>> +
>>>>
>>>> Are these two only used for in-order? Please document that.
>>>
>>> Yes, I will do that.
>>>
>>>> I also want some documentation about the batching trickery
>>>> used please.
>>>> What is batched, when, how is batching flushed, why are we
>>>> only batching in-order ...
>>>
>>> I'm not sure I get things like this, what you want seems to be the
>>> behaviour of the device which has been stated by the spec or I may
>>> miss something here.
>>
>> "a single used ring entry with the id corresponding to the
>>   head entry of the descriptor chain describing the last buffer in the
>>   batch"
>> ?
> 
> Exactly.
> 
>>
>> so together they form this used ring entry describing the last buffer?
>> "head" is the id and "len" the length?
> 
> Yes.
> 
>>
>> maybe
>>
>>          /*
>>           * With IN_ORDER, devices write a single used ring entry with
>>           * the id corresponding to the head entry of the descriptor chain
>>           * describing the last buffer in the batch
>>           */
>>          struct used_entry {
>>                  u32 id;
>>                  u32 len;
>>          } batch_last;
>>
>> ?
> 
> This should be fine.
> 
>>
>>
>>
>>
>>>>
>>>>
>>>>
>>>>
>>>>>        /* Number we've added since last sync. */
>>>>>        unsigned int num_added;
>>>>>
>>>>> @@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
>>>>>
>>>>>   #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
>>>>>
>>>>> -
>>>>>   static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
>>>>>   {
>>>>> -     return vq->layout == PACKED;
>>>>> +     return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
>>>>> +}
>>>>> +
>>>>> +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
>>>>> +{
>>>>> +     return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
>>>>>   }
>>>>>
>>>>>   static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
>>>>> @@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
>>>>>        struct vring_desc_extra *extra;
>>>>>        struct scatterlist *sg;
>>>>>        struct vring_desc *desc;
>>>>> -     unsigned int i, n, c, avail, descs_used, err_idx;
>>>>> +     unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;
>>>>
>>>>
>>>> I would add a comment here:
>>>>
>>>> /* Total length for in-order */
>>>> unsigned int total_len = 0;
>>>
>>> Ok.
>>>
>>> Thanks
>>
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH V3 19/19] virtio_ring: add in order support
  2025-07-03 13:49           ` Jonah Palmer
@ 2025-07-07  3:28             ` Jason Wang
  0 siblings, 0 replies; 35+ messages in thread
From: Jason Wang @ 2025-07-07  3:28 UTC (permalink / raw)
  To: Jonah Palmer
  Cc: Michael S. Tsirkin, xuanzhuo, eperezma, virtualization,
	linux-kernel

On Thu, Jul 3, 2025 at 9:50 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 7/2/25 11:13 PM, Jason Wang wrote:
> > On Wed, Jul 2, 2025 at 6:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>
> >> On Wed, Jul 02, 2025 at 05:29:18PM +0800, Jason Wang wrote:
> >>> On Tue, Jul 1, 2025 at 2:57 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >>>>
> >>>> On Mon, Jun 16, 2025 at 04:25:17PM +0800, Jason Wang wrote:
> >>>>> This patch implements in order support for both split virtqueue and
> >>>>> packed virtqueue.
> >>>>
> >>>> I'd like to see more motivation for this work, documented.
> >>>> It's not really performance, not as it stands, see below:
> >>>>
> >>>>>
> >>>>> Benchmark with KVM guest + testpmd on the host shows:
> >>>>>
> >>>>> For split virtqueue: no obvious differences were noticed
> >>>>>
> >>>>> For packed virtqueue:
> >>>>>
> >>>>> 1) RX gets 3.1% PPS improvements from 6.3 Mpps to 6.5 Mpps
> >>>>> 2) TX gets 4.6% PPS improvements from 8.6 Mpps to 9.0 Mpps
> >>>>>
> >>>>
> >>>> That's a very modest improvement for a lot of code.
> >>>> I also note you put in some batching just for in-order.
> >>>> Which could also explain the gains maybe?
> >>>> What if you just put in a simple implementation with no
> >>>> batching tricks? do you still see a gain?
> >>>
> >>> It is used to implement the batch used updating.
> >>>
> >>> """
> >>> Some devices always use descriptors in the same order in which they
> >>> have been made available. These devices can offer the
> >>> VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> >>> devices to notify the use of a batch of buffers to the driver by only
> >>> writing out a single used ring entry with the id corresponding to the
> >>> head entry of the descriptor chain describing the last buffer in the
> >>> batch.
> >>> """
> >>>
> >>> DPDK implements this behavior, so it's a must for the virtio driver.
> >>>
> >>>> Does any hardware implement this? Maybe that can demonstrate
> >>>> bigger gains.
> >>>
> >>> Maybe but I don't have one in my hand.
> >>>
> >>> For performance, I think it should be sufficient as a starter. I can
> >>> say in the next version something like "more optimizations could be
> >>> done on top"
> >>
> >> What are some optimizations you have in mind?
> >
> > One thing in my mind, spec currently said:
> >
> > """
> >   If negotiated, this knowledge allows devices to notify the use of a
> > batch of buffers to the driver by only writing out a single used
> > descriptor with the Buffer ID corresponding to the last descriptor in
> > the batch.
> > """
> >
> > If the device writes the last descriptor ID instead of the buffer ID
> > and skip the number of descriptors in the used ring. For split
> > virtqueue, the avail ring is not needed anymore. Device knows the
> > availability of buffers via avail_idx. In this way, we completely
> > eliminate the access of the available ring. This reduces the memory
> > access which is expensive for both:
> >
> > 1) kernel vhost-net where small user space memory access is expensive
> > 2) hardware PCI transactions
> >
> > Does this make sense?
> >
> >>
> >>
> >>
> >>> Note that the patch that introduces packed virtqueue, there's not even
> >>> any numbers:
> >>>
> >>> commit 1ce9e6055fa0a9043405c5604cf19169ec5379ff
> >>> Author: Tiwei Bie <tiwei.bie@intel.com>
> >>> Date:   Wed Nov 21 18:03:27 2018 +0800
> >>>
> >>>      virtio_ring: introduce packed ring support
> >>>
> >>>      Introduce the packed ring support. Packed ring can only be
> >>>      created by vring_create_virtqueue() and each chunk of packed
> >>>      ring will be allocated individually. Packed ring can not be
> >>>      created on preallocated memory by vring_new_virtqueue() or
> >>>      the likes currently.
> >>>
> >>>      Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >>>      Signed-off-by: David S. Miller <davem@davemloft.net>
> >>
> >>
> >> I think the assumption there was that intel has hardware that
> >> requires packed. That's why Dave merged this.
> >
> > I think it should according to Qemu patch:
> >
> > commit c03213fdc9d7b680cc575cd1e725750702a10b09
> > Author: Jonah Palmer <jonah.palmer@oracle.com>
> > Date:   Wed Jul 10 08:55:18 2024 -0400
> >
> >      vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
> >
> >      Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
> >      devices.
> >
> >      The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
> >      devices ensures that the backend is capable of offering and providing
> >      support for this feature, and that it can be disabled if the backend
> >      does not support it.
> >
> >      Acked-by: Eugenio Pérez <eperezma@redhat.com>
> >      Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >      Message-Id: <20240710125522.4168043-6-jonah.palmer@oracle.com>
> >      Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >      Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > Adding Jonah for more thought here.
> >
>
> Hi. By "it should", are you referring to Intel having hardware requiring
> this feature? Sorry, just having a bit of trouble following.

I meant I guess you may have the hardware that supports IN_ORDER.

>
> In any case, this looks like a good first implementation that can be
> used as a foundation for future implementations to further improve its
> performance.

Right.

>
> This was the thought process I had when I implemented this feature in
> Qemu. That is, get a solid framework down that supports this feature
> (which had some small, modest performance improvements for some devices)
> and then add on future patches (perhaps device-specific and/or general
> implementations) that take advantage of the fact that data follows this
> FIFO model.
>
> As Jason mentioned, one such future implementation could remove the need
> for the use of the avail ring in the split VQ case since the device
> wouldn't need to read it to learn which descriptor comes next.

Right.

>
> Another example could be with vhost-net / vhost-vdpa. Currently each
> queue tracks 3 separate indices and keeps a per-descriptor bookkeeping
> table to handle buffers completing out of order. If the backend knows
> data is FIFO, we might be able to drop these trackers and just use a
> head and tail counter with a single contiguous iovec ring. This could
> result in a smaller cache footprint and fewer DMAs.

Exactly.

So I had a quick hack on vhost-net to implement this optimization
(bypasses the reading of available ring). It gives us 15.8%
improvement with this series. DPDK gives me 10% PPS as well.

https://github.com/jasowang/net/commit/719ce1d24e3369cde9c5fcc8c6d7518ca9022e5f

I think it should be sufficient to move forward even without real
hardware support. Maybe vhost-net part can go first.

Thanks

>
> Jonah
>
> >>
> >>>>
> >>>>
> >>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>>> ---
> >>>>>   drivers/virtio/virtio_ring.c | 423 +++++++++++++++++++++++++++++++++--
> >>>>>   1 file changed, 402 insertions(+), 21 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>>> index 27a9459a0555..21d456392ba0 100644
> >>>>> --- a/drivers/virtio/virtio_ring.c
> >>>>> +++ b/drivers/virtio/virtio_ring.c
> >>>>> @@ -70,11 +70,14 @@
> >>>>>   enum vq_layout {
> >>>>>        SPLIT = 0,
> >>>>>        PACKED,
> >>>>> +     SPLIT_IN_ORDER,
> >>>>> +     PACKED_IN_ORDER,
> >>>>>        VQ_TYPE_MAX,
> >>>>>   };
> >>>>>
> >>>>>   struct vring_desc_state_split {
> >>>>>        void *data;                     /* Data for callback. */
> >>>>> +     u32 total_len;                  /* Buffer Length */
> >>>>>
> >>>>>        /* Indirect desc table and extra table, if any. These two will be
> >>>>>         * allocated together. So we won't stress more to the memory allocator.
> >>>>> @@ -84,6 +87,7 @@ struct vring_desc_state_split {
> >>>>>
> >>>>>   struct vring_desc_state_packed {
> >>>>>        void *data;                     /* Data for callback. */
> >>>>> +     u32 total_len;                  /* Buffer Length */
> >>>>>
> >>>>>        /* Indirect desc table and extra table, if any. These two will be
> >>>>>         * allocated together. So we won't stress more to the memory allocator.
> >>>>
> >>>> We are bloating up the cache footprint for everyone,
> >>>> so there's a chance of regressions.
> >>>> Pls include benchmark for in order off, to make sure we
> >>>> are not regressing.
> >>>
> >>> Ok.
> >>>
> >>>> How big was the ring?
> >>>
> >>> 256.
> >>
> >> that is very modest, you want to fill at least one cache way,
> >> preferably more.
> >
> > I can test larger queue sizes.
> >
> >>
> >>>> Worth trying with a biggish one, where there is more cache
> >>>> pressure.
> >>>
> >>> Ok.
> >>>
> >>>>
> >>>>
> >>>> Why not have a separate state for in-order?
> >>>
> >>> It can work.
> >>>
> >>>>
> >>>>
> >>>>
> >>>>> @@ -206,6 +210,12 @@ struct vring_virtqueue {
> >>>>>
> >>>>>        /* Head of free buffer list. */
> >>>>>        unsigned int free_head;
> >>>>> +
> >>>>> +     /* Head of the batched used buffers, vq->num means no batching */
> >>>>> +     unsigned int batch_head;
> >>>>> +
> >>>>> +     unsigned int batch_len;
> >>>>> +
> >>>>
> >>>> Are these two only used for in-order? Please document that.
> >>>
> >>> Yes, I will do that.
> >>>
> >>>> I also want some documentation about the batching trickery
> >>>> used please.
> >>>> What is batched, when, how is batching flushed, why are we
> >>>> only batching in-order ...
> >>>
> >>> I'm not sure I get things like this, what you want seems to be the
> >>> behaviour of the device which has been stated by the spec or I may
> >>> miss something here.
> >>
> >> "a single used ring entry with the id corresponding to the
> >>   head entry of the descriptor chain describing the last buffer in the
> >>   batch"
> >> ?
> >
> > Exactly.
> >
> >>
> >> so together they form this used ring entry describing the last buffer?
> >> "head" is the id and "len" the length?
> >
> > Yes.
> >
> >>
> >> maybe
> >>
> >>          /*
> >>           * With IN_ORDER, devices write a single used ring entry with
> >>           * the id corresponding to the head entry of the descriptor chain
> >>           * describing the last buffer in the batch
> >>           */
> >>          struct used_entry {
> >>                  u32 id;
> >>                  u32 len;
> >>          } batch_last;
> >>
> >> ?
> >
> > This should be fine.
> >
> >>
> >>
> >>
> >>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>        /* Number we've added since last sync. */
> >>>>>        unsigned int num_added;
> >>>>>
> >>>>> @@ -256,10 +266,14 @@ static void vring_free(struct virtqueue *_vq);
> >>>>>
> >>>>>   #define to_vvq(_vq) container_of_const(_vq, struct vring_virtqueue, vq)
> >>>>>
> >>>>> -
> >>>>>   static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq)
> >>>>>   {
> >>>>> -     return vq->layout == PACKED;
> >>>>> +     return vq->layout == PACKED || vq->layout == PACKED_IN_ORDER;
> >>>>> +}
> >>>>> +
> >>>>> +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq)
> >>>>> +{
> >>>>> +     return vq->layout == SPLIT_IN_ORDER || vq->layout == PACKED_IN_ORDER;
> >>>>>   }
> >>>>>
> >>>>>   static bool virtqueue_use_indirect(const struct vring_virtqueue *vq,
> >>>>> @@ -570,7 +584,7 @@ static inline int virtqueue_add_split(struct vring_virtqueue *vq,
> >>>>>        struct vring_desc_extra *extra;
> >>>>>        struct scatterlist *sg;
> >>>>>        struct vring_desc *desc;
> >>>>> -     unsigned int i, n, c, avail, descs_used, err_idx;
> >>>>> +     unsigned int i, n, c, avail, descs_used, err_idx, total_len = 0;
> >>>>
> >>>>
> >>>> I would add a comment here:
> >>>>
> >>>> /* Total length for in-order */
> >>>> unsigned int total_len = 0;
> >>>
> >>> Ok.
> >>>
> >>> Thanks
> >>
> >
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-07-07  3:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-16  8:24 [PATCH V3 00/19] virtio_ring in order support Jason Wang
2025-06-16  8:24 ` [PATCH V3 01/19] virtio_ring: rename virtqueue_reinit_xxx to virtqueue_reset_xxx() Jason Wang
2025-06-24 10:35   ` Lei Yang
2025-06-16  8:25 ` [PATCH V3 02/19] virtio_ring: switch to use vring_virtqueue in virtqueue_poll variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 03/19] virtio_ring: unify logic of virtqueue_poll() and more_used() Jason Wang
2025-06-16  8:25 ` [PATCH V3 04/19] virtio_ring: switch to use vring_virtqueue for virtqueue resize variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 05/19] virtio_ring: switch to use vring_virtqueue for virtqueue_kick_prepare variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 06/19] virtio_ring: switch to use vring_virtqueue for virtqueue_add variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 07/19] virtio: " Jason Wang
2025-06-16  8:25 ` [PATCH V3 08/19] virtio_ring: switch to use vring_virtqueue for enable_cb_prepare variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 09/19] virtio_ring: use vring_virtqueue for enable_cb_delayed variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 10/19] virtio_ring: switch to use vring_virtqueue for disable_cb variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 11/19] virtio_ring: switch to use vring_virtqueue for detach_unused_buf variants Jason Wang
2025-06-16  8:25 ` [PATCH V3 12/19] virtio_ring: use u16 for last_used_idx in virtqueue_poll_split() Jason Wang
2025-06-16  8:25 ` [PATCH V3 13/19] virtio_ring: introduce virtqueue ops Jason Wang
2025-07-01  6:28   ` Michael S. Tsirkin
2025-07-03  3:20     ` Jason Wang
2025-06-16  8:25 ` [PATCH V3 14/19] virtio_ring: determine descriptor flags at one time Jason Wang
2025-07-01  6:42   ` Michael S. Tsirkin
2025-07-01  7:25     ` Jason Wang
2025-06-16  8:25 ` [PATCH V3 15/19] virtio_ring: factor out core logic of buffer detaching Jason Wang
2025-06-16  8:25 ` [PATCH V3 16/19] virtio_ring: factor out core logic for updating last_used_idx Jason Wang
2025-06-16  8:25 ` [PATCH V3 17/19] virtio_ring: factor out split indirect detaching logic Jason Wang
2025-07-01  6:44   ` Michael S. Tsirkin
2025-06-16  8:25 ` [PATCH V3 18/19] virtio_ring: factor out split " Jason Wang
2025-07-01  6:45   ` Michael S. Tsirkin
2025-06-16  8:25 ` [PATCH V3 19/19] virtio_ring: add in order support Jason Wang
2025-07-01  6:56   ` Michael S. Tsirkin
2025-07-02  9:29     ` Jason Wang
2025-07-02 10:56       ` Michael S. Tsirkin
2025-07-03  3:13         ` Jason Wang
2025-07-03 13:49           ` Jonah Palmer
2025-07-07  3:28             ` Jason Wang
2025-06-17 12:29 ` [PATCH V3 00/19] virtio_ring " Eugenio Perez Martin
2025-07-01  6:58 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).