qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support
@ 2018-06-05 19:07 wexu
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1 wexu
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: wexu @ 2018-06-05 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

Todo:
- address Rx slow performance
- event index interrupt suppression test

v1->v2
- sync to tiwei's v5
- reuse memory cache function with 1.0
- dropped detach patch and notification helper(04 & 05 in v1)
- guest virtio-net driver unload/reload support
- event suppression support(not tested)
- addressed feedback from v1

About guest virtio-net load/unload:
Since last_avail, avail_wrap_count, used_idx and used_wrap_count are
all 16 or bool type variables, so I turned to merge them to
'vhost_vring_state.num' in stead of introducing a new case
in handling ioctl, test has been done with a tweak in kernel side like:

--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1439,10 +1439,16 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
            r = -EFAULT;
            break;
        }
-               if (s.num > 0xffff) {
-                       r = -EINVAL;
-                       break;
-               }
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+                       vq->avail_wrap_counter = (bool)(uint16_t)(s.num >> 16);
+                       vq->last_used_idx = (uint16_t)(s.num >> 32);
+                       vq->used_wrap_counter = (bool)(uint16_t)(s.num >> 48);
+                } else {
+                    if (s.num > 0xffff) {
+                            r = -EINVAL;
+                            break;
+                    }
+                }
        vq->last_avail_idx = s.num;
        /* Forget the cached index value. */
        vq->avail_idx = vq->last_avail_idx;
@@ -1450,8 +1456,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
    case VHOST_GET_VRING_BASE:
        s.index = idx;
        s.num = vq->last_avail_idx;
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+                       s.num |= vq->avail_wrap_counter << 16;
+                       s.num |= vq->last_used_idx << 32;
+                       s.num |= vq->used_wrap_counter << 48;
+                }
        if (copy_to_user(argp, &s, sizeof s))
            r = -EFAULT;
+               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+                       s.num |= vq->avail_wrap_counter << 31;
        break;
    case VHOST_SET_VRING_ADDR:
        if (copy_from_user(&a, argp, sizeof a)) {

Wei Xu (8):
  virtio: feature bit, data structure, init for packed ring
  virtio: memory cache for packed ring
  virtio: empty check and desc read for packed ring
  virtio: get avail bytes check for packed ring
  virtio: queue pop support for packed ring
  virtio: flush/push support for packed ring
  virtio: event suppression support for packed ring
  virtio: support guest driver reload for vhost-net

 hw/net/vhost_net.c                             |   2 +
 hw/virtio/virtio.c                             | 677 +++++++++++++++++++++++--
 include/hw/virtio/virtio.h                     |   4 +-
 include/standard-headers/linux/virtio_config.h |  15 +
 4 files changed, 649 insertions(+), 49 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
@ 2018-06-05 19:07 ` wexu
  2018-06-06  2:49   ` Jason Wang
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring wexu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: wexu @ 2018-06-05 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

New feature bit and members for packed ring.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/net/vhost_net.c                             |  2 ++
 hw/virtio/virtio.c                             | 27 ++++++++++++++++++++++++--
 include/hw/virtio/virtio.h                     |  4 +++-
 include/standard-headers/linux/virtio_config.h |  2 ++
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e037db6..f593086 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -53,6 +53,7 @@ static const int kernel_feature_bits[] = {
     VIRTIO_F_VERSION_1,
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_F_RING_PACKED,
     VHOST_INVALID_FEATURE_BIT
 };
 
@@ -78,6 +79,7 @@ static const int user_feature_bits[] = {
     VIRTIO_NET_F_MRG_RXBUF,
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
+    VIRTIO_F_RING_PACKED,
 
     /* This bit implies RARP isn't sent by QEMU out of band */
     VIRTIO_NET_F_GUEST_ANNOUNCE,
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 006d3d1..e192a9a 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -39,6 +39,13 @@ typedef struct VRingDesc
     uint16_t next;
 } VRingDesc;
 
+typedef struct VRingDescPacked {
+    uint64_t addr;
+    uint32_t len;
+    uint16_t id;
+    uint16_t flags;
+} VRingDescPacked;
+
 typedef struct VRingAvail
 {
     uint16_t flags;
@@ -62,8 +69,14 @@ typedef struct VRingUsed
 typedef struct VRingMemoryRegionCaches {
     struct rcu_head rcu;
     MemoryRegionCache desc;
-    MemoryRegionCache avail;
-    MemoryRegionCache used;
+    union {
+        MemoryRegionCache avail;
+        MemoryRegionCache driver;
+    };
+    union {
+        MemoryRegionCache used;
+        MemoryRegionCache device;
+    };
 } VRingMemoryRegionCaches;
 
 typedef struct VRing
@@ -77,6 +90,11 @@ typedef struct VRing
     VRingMemoryRegionCaches *caches;
 } VRing;
 
+typedef struct VRingPackedDescEvent {
+    uint16_t off_wrap;
+    uint16_t flags;
+} VRingPackedDescEvent ;
+
 struct VirtQueue
 {
     VRing vring;
@@ -89,6 +107,9 @@ struct VirtQueue
 
     uint16_t used_idx;
 
+    bool avail_wrap_counter;
+    bool used_wrap_counter;
+
     /* Last used index value we have signalled on */
     uint16_t signalled_used;
 
@@ -1213,6 +1234,8 @@ void virtio_reset(void *opaque)
         vdev->vq[i].last_avail_idx = 0;
         vdev->vq[i].shadow_avail_idx = 0;
         vdev->vq[i].used_idx = 0;
+        vdev->vq[i].avail_wrap_counter = true;
+        vdev->vq[i].used_wrap_counter = true;
         virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
         vdev->vq[i].signalled_used = 0;
         vdev->vq[i].signalled_used_valid = false;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 098bdaa..4a7fb21 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -262,7 +262,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
     DEFINE_PROP_BIT64("any_layout", _state, _field, \
                       VIRTIO_F_ANY_LAYOUT, true), \
     DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
-                      VIRTIO_F_IOMMU_PLATFORM, false)
+                      VIRTIO_F_IOMMU_PLATFORM, false), \
+    DEFINE_PROP_BIT64("ring_packed", _state, _field, \
+                      VIRTIO_F_RING_PACKED, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n);
diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index b777069..6ee5529 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -71,4 +71,6 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+#define VIRTIO_F_RING_PACKED		34
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1 wexu
@ 2018-06-05 19:07 ` wexu
  2018-06-06  2:53   ` Jason Wang
  2018-06-13 12:27   ` Paolo Bonzini
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 3/8] virtio: empty check and desc read " wexu
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: wexu @ 2018-06-05 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

Mostly reuse memory cache with 1.0 except for the offset calculation.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index e192a9a..f6c0689 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -150,11 +150,8 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
     VRingMemoryRegionCaches *old = vq->vring.caches;
     VRingMemoryRegionCaches *new;
     hwaddr addr, size;
-    int event_size;
     int64_t len;
 
-    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
-
     addr = vq->vring.desc;
     if (!addr) {
         return;
@@ -168,7 +165,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
         goto err_desc;
     }
 
-    size = virtio_queue_get_used_size(vdev, n) + event_size;
+    size = virtio_queue_get_used_size(vdev, n);
     len = address_space_cache_init(&new->used, vdev->dma_as,
                                    vq->vring.used, size, true);
     if (len < size) {
@@ -176,7 +173,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
         goto err_used;
     }
 
-    size = virtio_queue_get_avail_size(vdev, n) + event_size;
+    size = virtio_queue_get_avail_size(vdev, n);
     len = address_space_cache_init(&new->avail, vdev->dma_as,
                                    vq->vring.avail, size, false);
     if (len < size) {
@@ -2320,14 +2317,28 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
 
 hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
 {
-    return offsetof(VRingAvail, ring) +
-        sizeof(uint16_t) * vdev->vq[n].vring.num;
+    int s;
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return sizeof(struct VRingPackedDescEvent);
+    } else {
+        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
+        return offsetof(VRingAvail, ring) +
+            sizeof(uint16_t) * vdev->vq[n].vring.num + s;
+    }
 }
 
 hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
 {
-    return offsetof(VRingUsed, ring) +
-        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
+    int s;
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return sizeof(struct VRingPackedDescEvent);
+    } else {
+        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
+        return offsetof(VRingUsed, ring) +
+            sizeof(VRingUsedElem) * vdev->vq[n].vring.num + s;
+    }
 }
 
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 3/8] virtio: empty check and desc read for packed ring
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1 wexu
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring wexu
@ 2018-06-05 19:07 ` wexu
  2018-06-06  3:09   ` Jason Wang
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 4/8] virtio: get avail bytes check " wexu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: wexu @ 2018-06-05 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

helper for ring empty check and descriptor read.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 59 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f6c0689..bd669a2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -24,6 +24,9 @@
 #include "hw/virtio/virtio-access.h"
 #include "sysemu/dma.h"
 
+#define AVAIL_DESC_PACKED(b) ((b) << 7)
+#define USED_DESC_PACKED(b)  ((b) << 15)
+
 /*
  * The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. This is the default, used by transports like PCI
@@ -357,10 +360,27 @@ int virtio_queue_ready(VirtQueue *vq)
     return vq->vring.avail != 0;
 }
 
+static void vring_packed_desc_read(VirtIODevice *vdev, VRingDescPacked *desc,
+                            MemoryRegionCache *cache, int i)
+{
+    address_space_read_cached(cache, i * sizeof(VRingDescPacked),
+                              desc, sizeof(VRingDescPacked));
+    virtio_tswap64s(vdev, &desc->addr);
+    virtio_tswap32s(vdev, &desc->len);
+    virtio_tswap16s(vdev, &desc->id);
+    virtio_tswap16s(vdev, &desc->flags);
+}
+
+static inline bool is_desc_avail(struct VRingDescPacked *desc)
+{
+    return !!(desc->flags & AVAIL_DESC_PACKED(1)) !=
+            !!(desc->flags & USED_DESC_PACKED(1));
+}
+
 /* Fetch avail_idx from VQ memory only when we really need to know if
  * guest has added some buffers.
  * Called within rcu_read_lock().  */
-static int virtio_queue_empty_rcu(VirtQueue *vq)
+static int virtio_queue_split_empty_rcu(VirtQueue *vq)
 {
     if (unlikely(!vq->vring.avail)) {
         return 1;
@@ -373,7 +393,7 @@ static int virtio_queue_empty_rcu(VirtQueue *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
-int virtio_queue_empty(VirtQueue *vq)
+static int virtio_queue_split_empty(VirtQueue *vq)
 {
     bool empty;
 
@@ -391,6 +411,42 @@ int virtio_queue_empty(VirtQueue *vq)
     return empty;
 }
 
+static int virtio_queue_packed_empty_rcu(VirtQueue *vq)
+{
+    struct VRingDescPacked desc;
+    VRingMemoryRegionCaches *cache;
+
+    if (unlikely(!vq->vring.desc)) {
+        return 1;
+    }
+
+    cache = vring_get_region_caches(vq);
+    vring_packed_desc_read(vq->vdev, &desc, &cache->desc, vq->last_avail_idx);
+
+    /* Make sure we see the updated flag */
+    smp_mb();
+    return !is_desc_avail(&desc);
+}
+
+static int virtio_queue_packed_empty(VirtQueue *vq)
+{
+    bool empty;
+
+    rcu_read_lock();
+    empty = virtio_queue_packed_empty_rcu(vq);
+    rcu_read_unlock();
+    return empty;
+}
+
+int virtio_queue_empty(VirtQueue *vq)
+{
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        return virtio_queue_packed_empty(vq);
+    } else {
+        return virtio_queue_split_empty(vq);
+    }
+}
+
 static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
                                unsigned int len)
 {
@@ -862,7 +918,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
         return NULL;
     }
     rcu_read_lock();
-    if (virtio_queue_empty_rcu(vq)) {
+    if (virtio_queue_split_empty_rcu(vq)) {
         goto done;
     }
     /* Needed after virtio_queue_empty(), see comment in
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 4/8] virtio: get avail bytes check for packed ring
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (2 preceding siblings ...)
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 3/8] virtio: empty check and desc read " wexu
@ 2018-06-05 19:07 ` wexu
  2018-06-06  3:19   ` Jason Wang
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 5/8] virtio: queue pop " wexu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: wexu @ 2018-06-05 19:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

mostly as same as 1.0 except traversing all desc to feed
headcount, need a refactor.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 145 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bd669a2..cdbb5af 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -650,9 +650,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
     return VIRTQUEUE_READ_DESC_MORE;
 }
 
-void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
-                               unsigned int *out_bytes,
-                               unsigned max_in_bytes, unsigned max_out_bytes)
+static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
+                            unsigned int *in_bytes, unsigned int *out_bytes,
+                            unsigned max_in_bytes, unsigned max_out_bytes)
 {
     VirtIODevice *vdev = vq->vdev;
     unsigned int max, idx;
@@ -775,6 +775,148 @@ err:
     goto done;
 }
 
+static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
+                            unsigned int *in_bytes, unsigned int *out_bytes,
+                            unsigned max_in_bytes, unsigned max_out_bytes)
+{
+    VirtIODevice *vdev = vq->vdev;
+    unsigned int max, idx;
+    unsigned int total_bufs, in_total, out_total;
+    MemoryRegionCache *desc_cache;
+    VRingMemoryRegionCaches *caches;
+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+    int64_t len = 0;
+    VRingDescPacked desc;
+
+    if (unlikely(!vq->vring.desc)) {
+        if (in_bytes) {
+            *in_bytes = 0;
+        }
+        if (out_bytes) {
+            *out_bytes = 0;
+        }
+        return;
+    }
+
+    rcu_read_lock();
+    idx = vq->last_avail_idx;
+    total_bufs = in_total = out_total = 0;
+
+    max = vq->vring.num;
+    caches = vring_get_region_caches(vq);
+    if (caches->desc.len < max * sizeof(VRingDescPacked)) {
+        virtio_error(vdev, "Cannot map descriptor ring");
+        goto err;
+    }
+
+    desc_cache = &caches->desc;
+    vring_packed_desc_read(vdev, &desc, desc_cache, idx);
+    while (is_desc_avail(&desc)) {
+        unsigned int num_bufs;
+        unsigned int i;
+
+        num_bufs = total_bufs;
+
+        if (desc.flags & VRING_DESC_F_INDIRECT) {
+            if (desc.len % sizeof(VRingDescPacked)) {
+                virtio_error(vdev, "Invalid size for indirect buffer table");
+                goto err;
+            }
+
+            /* If we've got too many, that implies a descriptor loop. */
+            if (num_bufs >= max) {
+                virtio_error(vdev, "Looped descriptor");
+                goto err;
+            }
+
+            /* loop over the indirect descriptor table */
+            len = address_space_cache_init(&indirect_desc_cache,
+                                           vdev->dma_as,
+                                           desc.addr, desc.len, false);
+            desc_cache = &indirect_desc_cache;
+            if (len < desc.len) {
+                virtio_error(vdev, "Cannot map indirect buffer");
+                goto err;
+            }
+
+            max = desc.len / sizeof(VRingDescPacked);
+            num_bufs = i = 0;
+            vring_packed_desc_read(vdev, &desc, desc_cache, i);
+        }
+
+        do {
+            /* If we've got too many, that implies a descriptor loop. */
+            if (++num_bufs > max) {
+                virtio_error(vdev, "Looped descriptor");
+                goto err;
+            }
+
+            if (desc.flags & VRING_DESC_F_WRITE) {
+                in_total += desc.len;
+            } else {
+                out_total += desc.len;
+            }
+            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
+                goto done;
+            }
+
+            if (desc_cache == &indirect_desc_cache) {
+                if (++i >= vq->vring.num) {
+                    i -= vq->vring.num;
+                }
+                vring_packed_desc_read(vdev, &desc, desc_cache, i);
+            } else {
+                if (++idx >= vq->vring.num) {
+                    idx -= vq->vring.num;
+                }
+                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
+            }
+            /* Make sure we see the flags */
+            smp_mb();
+        } while (desc.flags & VRING_DESC_F_NEXT);
+
+        if (desc_cache == &indirect_desc_cache) {
+            address_space_cache_destroy(&indirect_desc_cache);
+            total_bufs++;
+            /* We missed one step on for indirect desc */
+            idx++;
+        } else {
+            total_bufs = num_bufs;
+        }
+
+        desc_cache = &caches->desc;
+        vring_packed_desc_read(vdev, &desc, desc_cache, idx % vq->vring.num);
+    }
+
+done:
+    address_space_cache_destroy(&indirect_desc_cache);
+    if (in_bytes) {
+        *in_bytes = in_total;
+    }
+    if (out_bytes) {
+        *out_bytes = out_total;
+    }
+    rcu_read_unlock();
+    return;
+
+err:
+    in_total = out_total = 0;
+    goto done;
+}
+
+void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
+                               unsigned int *out_bytes,
+                               unsigned max_in_bytes, unsigned max_out_bytes)
+{
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
+                                         max_in_bytes, max_out_bytes);
+    } else {
+        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
+                                        max_in_bytes, max_out_bytes);
+    }
+}
+
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 5/8] virtio: queue pop for packed ring
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (3 preceding siblings ...)
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 4/8] virtio: get avail bytes check " wexu
@ 2018-06-05 19:08 ` wexu
  2018-06-06  3:29   ` Jason Wang
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 6/8] virtio: flush/push " wexu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: wexu @ 2018-06-05 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 144 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index cdbb5af..0160d03 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1041,7 +1041,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
     return elem;
 }
 
-void *virtqueue_pop(VirtQueue *vq, size_t sz)
+static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
     unsigned int i, head, max;
     VRingMemoryRegionCaches *caches;
@@ -1176,6 +1176,149 @@ err_undo_map:
     goto done;
 }
 
+static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
+{
+    unsigned int i, head, max;
+    VRingMemoryRegionCaches *caches;
+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+    MemoryRegionCache *cache;
+    int64_t len;
+    VirtIODevice *vdev = vq->vdev;
+    VirtQueueElement *elem = NULL;
+    unsigned out_num, in_num, elem_entries;
+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
+    VRingDescPacked desc;
+
+    if (unlikely(vdev->broken)) {
+        return NULL;
+    }
+
+    rcu_read_lock();
+    if (virtio_queue_packed_empty_rcu(vq)) {
+        goto done;
+    }
+
+    /* When we start there are none of either input nor output. */
+    out_num = in_num = elem_entries = 0;
+
+    max = vq->vring.num;
+
+    if (vq->inuse >= vq->vring.num) {
+        virtio_error(vdev, "Virtqueue size exceeded");
+        goto done;
+    }
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
+        /* FIXME: TBD */
+    }
+
+    head = vq->last_avail_idx;
+    i = head;
+
+    caches = vring_get_region_caches(vq);
+    cache = &caches->desc;
+    vring_packed_desc_read(vdev, &desc, cache, i);
+    if (desc.flags & VRING_DESC_F_INDIRECT) {
+        if (desc.len % sizeof(VRingDescPacked)) {
+            virtio_error(vdev, "Invalid size for indirect buffer table");
+            goto done;
+        }
+
+        /* loop over the indirect descriptor table */
+        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
+                                       desc.addr, desc.len, false);
+        cache = &indirect_desc_cache;
+        if (len < desc.len) {
+            virtio_error(vdev, "Cannot map indirect buffer");
+            goto done;
+        }
+
+        max = desc.len / sizeof(VRingDescPacked);
+        i = 0;
+        vring_packed_desc_read(vdev, &desc, cache, i);
+    }
+
+    /* Collect all the descriptors */
+    while (1) {
+        bool map_ok;
+
+        if (desc.flags & VRING_DESC_F_WRITE) {
+            map_ok = virtqueue_map_desc(vdev, &in_num, addr + out_num,
+                                        iov + out_num,
+                                        VIRTQUEUE_MAX_SIZE - out_num, true,
+                                        desc.addr, desc.len);
+        } else {
+            if (in_num) {
+                virtio_error(vdev, "Incorrect order for descriptors");
+                goto err_undo_map;
+            }
+            map_ok = virtqueue_map_desc(vdev, &out_num, addr, iov,
+                                        VIRTQUEUE_MAX_SIZE, false,
+                                        desc.addr, desc.len);
+        }
+        if (!map_ok) {
+            goto err_undo_map;
+        }
+
+        /* If we've got too many, that implies a descriptor loop. */
+        if (++elem_entries > max) {
+            virtio_error(vdev, "Looped descriptor");
+            goto err_undo_map;
+        }
+
+        if (++i >= vq->vring.num) {
+            i -= vq->vring.num;
+        }
+
+        if (desc.flags & VRING_DESC_F_NEXT) {
+            vring_packed_desc_read(vq->vdev, &desc, cache, i);
+        } else {
+            break;
+        }
+    }
+
+    /* Now copy what we have collected and mapped */
+    elem = virtqueue_alloc_element(sz, out_num, in_num);
+    for (i = 0; i < out_num; i++) {
+        elem->out_addr[i] = addr[i];
+        elem->out_sg[i] = iov[i];
+    }
+    for (i = 0; i < in_num; i++) {
+        elem->in_addr[i] = addr[head + out_num + i];
+        elem->in_sg[i] = iov[out_num + i];
+    }
+
+    vq->last_avail_idx += (cache == &indirect_desc_cache) ?
+                          1 : out_num + in_num;
+    if (vq->last_avail_idx >= vq->vring.num) {
+        vq->last_avail_idx -= vq->vring.num;
+        vq->avail_wrap_counter = !vq->avail_wrap_counter;
+    }
+    vq->inuse++;
+
+    trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
+done:
+    address_space_cache_destroy(&indirect_desc_cache);
+    rcu_read_unlock();
+
+    return elem;
+
+err_undo_map:
+    virtqueue_undo_map_desc(out_num, in_num, iov);
+    g_free(elem);
+    goto done;
+}
+
+void *virtqueue_pop(VirtQueue *vq, size_t sz)
+{
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        return virtqueue_packed_pop(vq, sz);
+    } else {
+        return virtqueue_split_pop(vq, sz);
+    }
+}
+
 /* virtqueue_drop_all:
  * @vq: The #VirtQueue
  * Drops all queued buffers and indicates them to the guest
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 6/8] virtio: flush/push for packed ring
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (4 preceding siblings ...)
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 5/8] virtio: queue pop " wexu
@ 2018-06-05 19:08 ` wexu
  2018-06-06  3:39   ` Jason Wang
       [not found]   ` <7dc2af60-47ad-1250-fc6c-3fdf288654c3@redhat.com>
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 7/8] virtio: event suppression " wexu
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: wexu @ 2018-06-05 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

Signed-off-by: Wei Xu <wexu@redhat.com>
Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 96 insertions(+), 13 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 0160d03..6f2da83 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -371,6 +371,21 @@ static void vring_packed_desc_read(VirtIODevice *vdev, VRingDescPacked *desc,
     virtio_tswap16s(vdev, &desc->flags);
 }
 
+static void vring_packed_desc_write(VirtIODevice *vdev, VRingDescPacked *desc,
+                            MemoryRegionCache *cache, int i)
+{
+    virtio_tswap64s(vdev, &desc->addr);
+    virtio_tswap32s(vdev, &desc->len);
+    virtio_tswap16s(vdev, &desc->id);
+    virtio_tswap16s(vdev, &desc->flags);
+    address_space_write_cached(cache,
+                               sizeof(VRingDescPacked) * i, desc,
+                               sizeof(VRingDescPacked));
+    address_space_cache_invalidate(cache,
+                                   sizeof(VRingDescPacked) * i,
+                                   sizeof(VRingDescPacked));
+}
+
 static inline bool is_desc_avail(struct VRingDescPacked *desc)
 {
     return !!(desc->flags & AVAIL_DESC_PACKED(1)) !=
@@ -526,19 +541,11 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
 }
 
 /* Called within rcu_read_lock().  */
-void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
+static void virtqueue_split_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx)
 {
     VRingUsedElem uelem;
 
-    trace_virtqueue_fill(vq, elem, len, idx);
-
-    virtqueue_unmap_sg(vq, elem, len);
-
-    if (unlikely(vq->vdev->broken)) {
-        return;
-    }
-
     if (unlikely(!vq->vring.used)) {
         return;
     }
@@ -550,16 +557,64 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
     vring_used_write(vq, &uelem, idx);
 }
 
-/* Called within rcu_read_lock().  */
-void virtqueue_flush(VirtQueue *vq, unsigned int count)
+static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
+                        unsigned int len, unsigned int idx)
 {
-    uint16_t old, new;
+    uint16_t w, head;
+    VRingMemoryRegionCaches *caches;
+    VRingDescPacked desc = {
+        .addr = 0,
+        .flags = 0,
+    };
+
+    if (unlikely(!vq->vring.desc)) {
+        return;
+    }
+
+    caches = vring_get_region_caches(vq);
+    head = vq->used_idx + idx;
+    head = head >= vq->vring.num ? (head - vq->vring.num) : head;
+    vring_packed_desc_read(vq->vdev, &desc, &caches->desc, head);
+
+    w = (desc.flags & AVAIL_DESC_PACKED(1)) >> 7;
+    desc.flags &= ~(AVAIL_DESC_PACKED(1) | USED_DESC_PACKED(1));
+    desc.flags |= AVAIL_DESC_PACKED(w) | USED_DESC_PACKED(w);
+    if (!(desc.flags & VRING_DESC_F_INDIRECT)) {
+        if (!(desc.flags & VRING_DESC_F_WRITE)) {
+            desc.len = 0;
+        } else {
+            desc.len = len;
+        }
+    }
+    vring_packed_desc_write(vq->vdev, &desc, &caches->desc, head);
+
+    /* Make sure flags has been updated */
+    smp_mb();
+}
+
+void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
+                    unsigned int len, unsigned int idx)
+{
+    trace_virtqueue_fill(vq, elem, len, idx);
+
+    virtqueue_unmap_sg(vq, elem, len);
 
     if (unlikely(vq->vdev->broken)) {
-        vq->inuse -= count;
         return;
     }
 
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtqueue_packed_fill(vq, elem, len, idx);
+    } else {
+        virtqueue_split_fill(vq, elem, len, idx);
+    }
+}
+
+/* Called within rcu_read_lock().  */
+static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
+{
+    uint16_t old, new;
+
     if (unlikely(!vq->vring.used)) {
         return;
     }
@@ -575,6 +630,34 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
         vq->signalled_used_valid = false;
 }
 
+static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
+{
+    if (unlikely(!vq->vring.desc)) {
+        return;
+    }
+
+    vq->inuse -= count;
+    vq->used_idx += count;
+    if (vq->used_idx >= vq->vring.num) {
+        vq->used_idx -= vq->vring.num;
+        vq->used_wrap_counter = !vq->used_wrap_counter;
+    }
+}
+
+void virtqueue_flush(VirtQueue *vq, unsigned int count)
+{
+    if (unlikely(vq->vdev->broken)) {
+        vq->inuse -= count;
+        return;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtqueue_packed_flush(vq, count);
+    } else {
+        virtqueue_split_flush(vq, count);
+    }
+}
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 7/8] virtio: event suppression for packed ring
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (5 preceding siblings ...)
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 6/8] virtio: flush/push " wexu
@ 2018-06-05 19:08 ` wexu
  2018-06-06  3:46   ` Jason Wang
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net wexu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: wexu @ 2018-06-05 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

Signed-off-by: Wei Xu <wexu@redhat.com>
Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c                             | 115 +++++++++++++++++++++++--
 include/standard-headers/linux/virtio_config.h |  13 +++
 2 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 6f2da83..4543974 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -226,6 +226,24 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
     virtio_tswap16s(vdev, &desc->next);
 }
 
+static void vring_packed_event_read(VirtIODevice *vdev,
+                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
+{
+    address_space_read_cached(cache, 0, e, sizeof(*e));
+    virtio_tswap16s(vdev, &e->off_wrap);
+    virtio_tswap16s(vdev, &e->flags);
+}
+
+static void vring_packed_event_write(VirtIODevice *vdev,
+                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
+{
+    virtio_tswap16s(vdev, &e->off_wrap);
+    virtio_tswap16s(vdev, &e->flags);
+    address_space_write_cached(cache, 0, e, sizeof(*e));
+    address_space_cache_invalidate(cache, 0, sizeof(VRingUsedElem));
+}
+
+
 static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
 {
     VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
@@ -332,14 +350,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
     address_space_cache_invalidate(&caches->used, pa, sizeof(val));
 }
 
-void virtio_queue_set_notification(VirtQueue *vq, int enable)
+static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
 {
-    vq->notification = enable;
-
-    if (!vq->vring.desc) {
-        return;
-    }
-
     rcu_read_lock();
     if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
         vring_set_avail_event(vq, vring_avail_idx(vq));
@@ -355,6 +367,38 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
     rcu_read_unlock();
 }
 
+static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
+{
+    VRingPackedDescEvent e;
+    VRingMemoryRegionCaches *caches;
+
+    rcu_read_lock();
+    caches  = vring_get_region_caches(vq);
+    vring_packed_event_read(vq->vdev, &caches->device, &e);
+    if (enable) {
+        e.flags = RING_EVENT_FLAGS_ENABLE;
+    } else {
+        e.flags = RING_EVENT_FLAGS_DISABLE;
+    }
+    vring_packed_event_write(vq->vdev, &caches->device, &e);
+    rcu_read_unlock();
+}
+
+void virtio_queue_set_notification(VirtQueue *vq, int enable)
+{
+    vq->notification = enable;
+
+    if (!vq->vring.desc) {
+        return;
+    }
+
+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+        virtio_queue_set_notification_packed(vq, enable);
+    } else {
+        virtio_queue_set_notification_split(vq, enable);
+    }
+}
+
 int virtio_queue_ready(VirtQueue *vq)
 {
     return vq->vring.avail != 0;
@@ -2059,8 +2103,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
     }
 }
 
-/* Called within rcu_read_lock().  */
-static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
     bool v;
@@ -2083,6 +2126,60 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
     return !v || vring_need_event(vring_get_used_event(vq), new, old);
 }
 
+static bool vring_packed_need_event(VirtQueue *vq, uint16_t off_wrap,
+                                    uint16_t new, uint16_t old)
+{
+    bool wrap = vq->used_wrap_counter;
+    int off = off_wrap & ~(1 << 15);
+
+    if (new < old) {
+        new += vq->vring.num;
+        wrap ^= 1;
+    }
+
+    if (wrap != off_wrap >> 15) {
+        off += vq->vring.num;
+    }
+
+    return vring_need_event(off, new, old);
+}
+
+static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VRingPackedDescEvent e;
+    uint16_t old, new;
+    bool v;
+    VRingMemoryRegionCaches *caches;
+
+    caches  = vring_get_region_caches(vq);
+    vring_packed_event_read(vdev, &caches->driver, &e);
+
+    /* Make sure we see the updated flags */
+    smp_mb();
+    if (e.flags == RING_EVENT_FLAGS_DISABLE) {
+        return false;
+    } else if (e.flags == RING_EVENT_FLAGS_ENABLE) {
+        return true;
+    }
+
+    v = vq->signalled_used_valid;
+    vq->signalled_used_valid = true;
+    old = vq->signalled_used;
+    new = vq->signalled_used = vq->used_idx;
+
+    return !v || vring_packed_need_event(vq, e.off_wrap, new, old);
+}
+
+/* Called within rcu_read_lock().  */
+static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+{
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        return virtio_packed_should_notify(vdev, vq);
+    } else {
+        return virtio_split_should_notify(vdev, vq);
+    }
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
     bool should_notify;
diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
index 6ee5529..53e5c83 100644
--- a/include/standard-headers/linux/virtio_config.h
+++ b/include/standard-headers/linux/virtio_config.h
@@ -73,4 +73,17 @@
 #define VIRTIO_F_IOMMU_PLATFORM		33
 
 #define VIRTIO_F_RING_PACKED		34
+
+/* Enable events */
+#define RING_EVENT_FLAGS_ENABLE 0x0
+/* Disable events */
+#define RING_EVENT_FLAGS_DISABLE 0x1
+/*
+ *  * Enable events for a specific descriptor
+ *   * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
+ *    * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
+ *     */
+#define RING_EVENT_FLAGS_DESC 0x2
+/* The value 0x3 is reserved */
+
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (6 preceding siblings ...)
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 7/8] virtio: event suppression " wexu
@ 2018-06-05 19:08 ` wexu
  2018-06-06  3:48   ` Jason Wang
  2018-06-06  2:34 ` [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support Jason Wang
  2018-06-06  3:49 ` Jason Wang
  9 siblings, 1 reply; 27+ messages in thread
From: wexu @ 2018-06-05 19:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: jasowang, tiwei.bie, mst, wexu, jfreimann

From: Wei Xu <wexu@redhat.com>

last_avail, avail_wrap_count, used_idx and used_wrap_count are
needed to support vhost-net backend, all these are either 16 or
bool variables, since state.num is 64bit wide, so here it is
possible to put them to the 'num' without introducing a new case
while handling ioctl.

Unload/Reload test has been done successfully with a patch in vhost kernel.

Signed-off-by: Wei Xu <wexu@redhat.com>
---
 hw/virtio/virtio.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 4543974..153f6d7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2862,33 +2862,59 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
     }
 }
 
-uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
+uint64_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
 {
-    return vdev->vq[n].last_avail_idx;
+    uint64_t num;
+
+    num = vdev->vq[n].last_avail_idx;
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        num |= ((uint64_t)vdev->vq[n].avail_wrap_counter) << 16;
+        num |= ((uint64_t)vdev->vq[n].used_idx) << 32;
+        num |= ((uint64_t)vdev->vq[n].used_wrap_counter) << 48;
+    }
+
+    return num;
 }
 
-void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
+void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint64_t num)
 {
-    vdev->vq[n].last_avail_idx = idx;
-    vdev->vq[n].shadow_avail_idx = idx;
+    vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx = (uint16_t)(num);
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+        vdev->vq[n].avail_wrap_counter = (uint16_t)(num >> 16);
+        vdev->vq[n].used_idx = (uint16_t)(num >> 32);
+        vdev->vq[n].used_wrap_counter = (uint16_t)(num >> 48);
+    }
 }
 
 void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
 {
     rcu_read_lock();
-    if (vdev->vq[n].vring.desc) {
+    if (!vdev->vq[n].vring.desc) {
+        goto out;
+    }
+
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
         vdev->vq[n].last_avail_idx = vring_used_idx(&vdev->vq[n]);
-        vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
     }
+    vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
+
+out:
     rcu_read_unlock();
 }
 
 void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
 {
     rcu_read_lock();
-    if (vdev->vq[n].vring.desc) {
+    if (!vdev->vq[n].vring.desc) {
+        goto out;
+    }
+
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
         vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
     }
+
+out:
     rcu_read_unlock();
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (7 preceding siblings ...)
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net wexu
@ 2018-06-06  2:34 ` Jason Wang
  2018-06-06  3:49 ` Jason Wang
  9 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2018-06-06  2:34 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:07, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Todo:
> - address Rx slow performance
> - event index interrupt suppression test

Tiwei's code support event index, you can try with that.

Actually, unless you disable event_idx explicitly, it work by default.

>
> v1->v2
> - sync to tiwei's v5
> - reuse memory cache function with 1.0
> - dropped detach patch and notification helper(04 & 05 in v1)
> - guest virtio-net driver unload/reload support
> - event suppression support(not tested)
> - addressed feedback from v1
>
> About guest virtio-net load/unload:
> Since last_avail, avail_wrap_count, used_idx and used_wrap_count are
> all 16 or bool type variables, so I turned to merge them to
> 'vhost_vring_state.num' in stead of introducing a new case
> in handling ioctl, test has been done with a tweak in kernel side like:

s.num is unsigned int which is 32bit, but you want to hold 2 16bit vars 
and a boolean.

So I'm afraid it won't work...

>
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1439,10 +1439,16 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
>              r = -EFAULT;
>              break;
>          }
> -               if (s.num > 0xffff) {
> -                       r = -EINVAL;
> -                       break;
> -               }
> +               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> +                       vq->avail_wrap_counter = (bool)(uint16_t)(s.num >> 16);
> +                       vq->last_used_idx = (uint16_t)(s.num >> 32);
> +                       vq->used_wrap_counter = (bool)(uint16_t)(s.num >> 48);
> +                } else {
> +                    if (s.num > 0xffff) {
> +                            r = -EINVAL;
> +                            break;
> +                    }
> +                }
>          vq->last_avail_idx = s.num;
>          /* Forget the cached index value. */
>          vq->avail_idx = vq->last_avail_idx;
> @@ -1450,8 +1456,15 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
>      case VHOST_GET_VRING_BASE:
>          s.index = idx;
>          s.num = vq->last_avail_idx;
> +               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> +                       s.num |= vq->avail_wrap_counter << 16;
> +                       s.num |= vq->last_used_idx << 32;
> +                       s.num |= vq->used_wrap_counter << 48;
> +                }
>          if (copy_to_user(argp, &s, sizeof s))
>              r = -EFAULT;
> +               if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +                       s.num |= vq->avail_wrap_counter << 31;
>          break;
>      case VHOST_SET_VRING_ADDR:
>          if (copy_from_user(&a, argp, sizeof a)) {
>
> Wei Xu (8):
>    virtio: feature bit, data structure, init for packed ring
>    virtio: memory cache for packed ring
>    virtio: empty check and desc read for packed ring
>    virtio: get avail bytes check for packed ring
>    virtio: queue pop support for packed ring
>    virtio: flush/push support for packed ring
>    virtio: event suppression support for packed ring
>    virtio: support guest driver reload for vhost-net
>
>   hw/net/vhost_net.c                             |   2 +
>   hw/virtio/virtio.c                             | 677 +++++++++++++++++++++++--
>   include/hw/virtio/virtio.h                     |   4 +-
>   include/standard-headers/linux/virtio_config.h |  15 +
>   4 files changed, 649 insertions(+), 49 deletions(-)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1 wexu
@ 2018-06-06  2:49   ` Jason Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2018-06-06  2:49 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:07, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> New feature bit and members for packed ring.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/net/vhost_net.c                             |  2 ++
>   hw/virtio/virtio.c                             | 27 ++++++++++++++++++++++++--
>   include/hw/virtio/virtio.h                     |  4 +++-
>   include/standard-headers/linux/virtio_config.h |  2 ++
>   4 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e037db6..f593086 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -53,6 +53,7 @@ static const int kernel_feature_bits[] = {
>       VIRTIO_F_VERSION_1,
>       VIRTIO_NET_F_MTU,
>       VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_F_RING_PACKED,
>       VHOST_INVALID_FEATURE_BIT
>   };
>   
> @@ -78,6 +79,7 @@ static const int user_feature_bits[] = {
>       VIRTIO_NET_F_MRG_RXBUF,
>       VIRTIO_NET_F_MTU,
>       VIRTIO_F_IOMMU_PLATFORM,
> +    VIRTIO_F_RING_PACKED,

This could be another patch for enabling packed ring for vhost. So it 
should be done in/after the patch of vhost support but not here.

>   
>       /* This bit implies RARP isn't sent by QEMU out of band */
>       VIRTIO_NET_F_GUEST_ANNOUNCE,
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 006d3d1..e192a9a 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -39,6 +39,13 @@ typedef struct VRingDesc
>       uint16_t next;
>   } VRingDesc;
>   
> +typedef struct VRingDescPacked {
> +    uint64_t addr;
> +    uint32_t len;
> +    uint16_t id;
> +    uint16_t flags;
> +} VRingDescPacked;
> +
>   typedef struct VRingAvail
>   {
>       uint16_t flags;
> @@ -62,8 +69,14 @@ typedef struct VRingUsed
>   typedef struct VRingMemoryRegionCaches {
>       struct rcu_head rcu;
>       MemoryRegionCache desc;
> -    MemoryRegionCache avail;
> -    MemoryRegionCache used;
> +    union {
> +        MemoryRegionCache avail;
> +        MemoryRegionCache driver;
> +    };
> +    union {
> +        MemoryRegionCache used;
> +        MemoryRegionCache device;
> +    };
>   } VRingMemoryRegionCaches;
>   
>   typedef struct VRing
> @@ -77,6 +90,11 @@ typedef struct VRing
>       VRingMemoryRegionCaches *caches;
>   } VRing;
>   
> +typedef struct VRingPackedDescEvent {
> +    uint16_t off_wrap;
> +    uint16_t flags;
> +} VRingPackedDescEvent ;
> +
>   struct VirtQueue
>   {
>       VRing vring;
> @@ -89,6 +107,9 @@ struct VirtQueue
>   
>       uint16_t used_idx;
>   
> +    bool avail_wrap_counter;
> +    bool used_wrap_counter;
> +
>       /* Last used index value we have signalled on */
>       uint16_t signalled_used;
>   
> @@ -1213,6 +1234,8 @@ void virtio_reset(void *opaque)
>           vdev->vq[i].last_avail_idx = 0;
>           vdev->vq[i].shadow_avail_idx = 0;
>           vdev->vq[i].used_idx = 0;
> +        vdev->vq[i].avail_wrap_counter = true;
> +        vdev->vq[i].used_wrap_counter = true;
>           virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
>           vdev->vq[i].signalled_used = 0;
>           vdev->vq[i].signalled_used_valid = false;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 098bdaa..4a7fb21 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -262,7 +262,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>       DEFINE_PROP_BIT64("any_layout", _state, _field, \
>                         VIRTIO_F_ANY_LAYOUT, true), \
>       DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> -                      VIRTIO_F_IOMMU_PLATFORM, false)
> +                      VIRTIO_F_IOMMU_PLATFORM, false), \
> +    DEFINE_PROP_BIT64("ring_packed", _state, _field, \
> +                      VIRTIO_F_RING_PACKED, false)

This is wrong. Even if default is false, setting it to true will break 
everything. Please move this to the end of the series.

>   
>   hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>   hwaddr virtio_queue_get_avail_addr(VirtIODevice *vdev, int n);
> diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
> index b777069..6ee5529 100644
> --- a/include/standard-headers/linux/virtio_config.h
> +++ b/include/standard-headers/linux/virtio_config.h
> @@ -71,4 +71,6 @@
>    * this is for compatibility with legacy systems.
>    */
>   #define VIRTIO_F_IOMMU_PLATFORM		33
> +
> +#define VIRTIO_F_RING_PACKED		34
>   #endif /* _LINUX_VIRTIO_CONFIG_H */

For formal version, it's better to have to just do a header sync in an 
independent patch.

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring wexu
@ 2018-06-06  2:53   ` Jason Wang
  2018-06-19  7:39     ` Wei Xu
  2018-06-13 12:27   ` Paolo Bonzini
  1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2018-06-06  2:53 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:07, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Mostly reuse memory cache with 1.0 except for the offset calculation.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 29 ++++++++++++++++++++---------
>   1 file changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e192a9a..f6c0689 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -150,11 +150,8 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
>       VRingMemoryRegionCaches *old = vq->vring.caches;
>       VRingMemoryRegionCaches *new;
>       hwaddr addr, size;
> -    int event_size;
>       int64_t len;
>   
> -    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> -
>       addr = vq->vring.desc;
>       if (!addr) {
>           return;
> @@ -168,7 +165,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
>           goto err_desc;
>       }
>   
> -    size = virtio_queue_get_used_size(vdev, n) + event_size;
> +    size = virtio_queue_get_used_size(vdev, n);
>       len = address_space_cache_init(&new->used, vdev->dma_as,
>                                      vq->vring.used, size, true);
>       if (len < size) {
> @@ -176,7 +173,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
>           goto err_used;
>       }
>   
> -    size = virtio_queue_get_avail_size(vdev, n) + event_size;
> +    size = virtio_queue_get_avail_size(vdev, n);
>       len = address_space_cache_init(&new->avail, vdev->dma_as,
>                                      vq->vring.avail, size, false);
>       if (len < size) {
> @@ -2320,14 +2317,28 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
>   
>   hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n)

I would rather rename this to virtio_queue_get_driver_size().

>   {
> -    return offsetof(VRingAvail, ring) +
> -        sizeof(uint16_t) * vdev->vq[n].vring.num;
> +    int s;
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return sizeof(struct VRingPackedDescEvent);
> +    } else {
> +        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> +        return offsetof(VRingAvail, ring) +
> +            sizeof(uint16_t) * vdev->vq[n].vring.num + s;
> +    }
>   }
>   
>   hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)

virtio_queue_get_device_size().

Thanks

>   {
> -    return offsetof(VRingUsed, ring) +
> -        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
> +    int s;
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return sizeof(struct VRingPackedDescEvent);
> +    } else {
> +        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> +        return offsetof(VRingUsed, ring) +
> +            sizeof(VRingUsedElem) * vdev->vq[n].vring.num + s;
> +    }
>   }
>   
>   uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 3/8] virtio: empty check and desc read for packed ring
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 3/8] virtio: empty check and desc read " wexu
@ 2018-06-06  3:09   ` Jason Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:09 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:07, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> helper for ring empty check and descriptor read.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 59 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index f6c0689..bd669a2 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -24,6 +24,9 @@
>   #include "hw/virtio/virtio-access.h"
>   #include "sysemu/dma.h"
>   
> +#define AVAIL_DESC_PACKED(b) ((b) << 7)
> +#define USED_DESC_PACKED(b)  ((b) << 15)
> +
>   /*
>    * The alignment to use between consumer and producer parts of vring.
>    * x86 pagesize again. This is the default, used by transports like PCI
> @@ -357,10 +360,27 @@ int virtio_queue_ready(VirtQueue *vq)
>       return vq->vring.avail != 0;
>   }
>   
> +static void vring_packed_desc_read(VirtIODevice *vdev, VRingDescPacked *desc,
> +                            MemoryRegionCache *cache, int i)
> +{
> +    address_space_read_cached(cache, i * sizeof(VRingDescPacked),
> +                              desc, sizeof(VRingDescPacked));
> +    virtio_tswap64s(vdev, &desc->addr);
> +    virtio_tswap32s(vdev, &desc->len);
> +    virtio_tswap16s(vdev, &desc->id);
> +    virtio_tswap16s(vdev, &desc->flags);
> +}
> +
> +static inline bool is_desc_avail(struct VRingDescPacked *desc)
> +{
> +    return !!(desc->flags & AVAIL_DESC_PACKED(1)) !=
> +            !!(desc->flags & USED_DESC_PACKED(1));
> +}

See spec and the discussion in the past for vhost patches:

"""
Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an 
available descriptor and
equal for a used descriptor.
Note that this observation is mostly useful for sanity-checking as these 
are necessary but not sufficient
conditions - for example, all descriptors are zero-initialized. To 
detect used and available descriptors it is
possible for drivers and devices to keep track of the last observed 
value of VIRTQ_DESC_F_USED/VIRTQ_-
DESC_F_AVAIL. Other techniques to detect 
VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
might also be possible.
"""

This may not work especially consider we may do migration or savevm/loadvm.

> +
>   /* Fetch avail_idx from VQ memory only when we really need to know if
>    * guest has added some buffers.
>    * Called within rcu_read_lock().  */
> -static int virtio_queue_empty_rcu(VirtQueue *vq)
> +static int virtio_queue_split_empty_rcu(VirtQueue *vq)
>   {
>       if (unlikely(!vq->vring.avail)) {
>           return 1;
> @@ -373,7 +393,7 @@ static int virtio_queue_empty_rcu(VirtQueue *vq)
>       return vring_avail_idx(vq) == vq->last_avail_idx;
>   }
>   
> -int virtio_queue_empty(VirtQueue *vq)
> +static int virtio_queue_split_empty(VirtQueue *vq)
>   {
>       bool empty;
>   
> @@ -391,6 +411,42 @@ int virtio_queue_empty(VirtQueue *vq)
>       return empty;
>   }
>   
> +static int virtio_queue_packed_empty_rcu(VirtQueue *vq)
> +{
> +    struct VRingDescPacked desc;
> +    VRingMemoryRegionCaches *cache;
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        return 1;
> +    }
> +
> +    cache = vring_get_region_caches(vq);
> +    vring_packed_desc_read(vq->vdev, &desc, &cache->desc, vq->last_avail_idx);
> +

I believe only we only care flag here.

> +    /* Make sure we see the updated flag */
> +    smp_mb();

Why need this?

Note there's a smp_rmb() after virtio_queue_empty_rcu() in virtquue_pop().

> +    return !is_desc_avail(&desc);
> +}
> +
> +static int virtio_queue_packed_empty(VirtQueue *vq)
> +{
> +    bool empty;
> +
> +    rcu_read_lock();
> +    empty = virtio_queue_packed_empty_rcu(vq);
> +    rcu_read_unlock();
> +    return empty;
> +}
> +
> +int virtio_queue_empty(VirtQueue *vq)
> +{
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtio_queue_packed_empty(vq);
> +    } else {
> +        return virtio_queue_split_empty(vq);
> +    }
> +}
> +
>   static void virtqueue_unmap_sg(VirtQueue *vq, const VirtQueueElement *elem,
>                                  unsigned int len)
>   {
> @@ -862,7 +918,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
>           return NULL;
>       }
>       rcu_read_lock();
> -    if (virtio_queue_empty_rcu(vq)) {
> +    if (virtio_queue_split_empty_rcu(vq)) {

This is meaningless unless you rename virtqueue_pop to 
virtqueue_pop_split and call it only for split ring.

Thanks

>           goto done;
>       }
>       /* Needed after virtio_queue_empty(), see comment in

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 4/8] virtio: get avail bytes check for packed ring
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 4/8] virtio: get avail bytes check " wexu
@ 2018-06-06  3:19   ` Jason Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:19 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:07, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> mostly as same as 1.0 except traversing all desc to feed
> headcount, need a refactor.
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 145 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index bd669a2..cdbb5af 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -650,9 +650,9 @@ static int virtqueue_read_next_desc(VirtIODevice *vdev, VRingDesc *desc,
>       return VIRTQUEUE_READ_DESC_MORE;
>   }
>   
> -void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> -                               unsigned int *out_bytes,
> -                               unsigned max_in_bytes, unsigned max_out_bytes)
> +static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
> +                            unsigned int *in_bytes, unsigned int *out_bytes,
> +                            unsigned max_in_bytes, unsigned max_out_bytes)
>   {
>       VirtIODevice *vdev = vq->vdev;
>       unsigned int max, idx;
> @@ -775,6 +775,148 @@ err:
>       goto done;
>   }
>   
> +static void virtqueue_packed_get_avail_bytes(VirtQueue *vq,
> +                            unsigned int *in_bytes, unsigned int *out_bytes,
> +                            unsigned max_in_bytes, unsigned max_out_bytes)
> +{
> +    VirtIODevice *vdev = vq->vdev;
> +    unsigned int max, idx;
> +    unsigned int total_bufs, in_total, out_total;
> +    MemoryRegionCache *desc_cache;
> +    VRingMemoryRegionCaches *caches;
> +    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +    int64_t len = 0;
> +    VRingDescPacked desc;
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        if (in_bytes) {
> +            *in_bytes = 0;
> +        }
> +        if (out_bytes) {
> +            *out_bytes = 0;
> +        }
> +        return;
> +    }
> +
> +    rcu_read_lock();
> +    idx = vq->last_avail_idx;
> +    total_bufs = in_total = out_total = 0;
> +
> +    max = vq->vring.num;
> +    caches = vring_get_region_caches(vq);
> +    if (caches->desc.len < max * sizeof(VRingDescPacked)) {
> +        virtio_error(vdev, "Cannot map descriptor ring");
> +        goto err;
> +    }
> +
> +    desc_cache = &caches->desc;
> +    vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> +    while (is_desc_avail(&desc)) {

So you did vring_packed_desc_read() like:

static void vring_packed_desc_read(VirtIODevice *vdev, VRingDescPacked 
*desc,
                             MemoryRegionCache *cache, int i)
{
     address_space_read_cached(cache, i * sizeof(VRingDescPacked),
                               desc, sizeof(VRingDescPacked));
     virtio_tswap64s(vdev, &desc->addr);
     virtio_tswap32s(vdev, &desc->len);
     virtio_tswap16s(vdev, &desc->id);
     virtio_tswap16s(vdev, &desc->flags);
}

How do you guarantee desc->flags was read before all other fields?

> +        unsigned int num_bufs;
> +        unsigned int i;
> +
> +        num_bufs = total_bufs;
> +
> +        if (desc.flags & VRING_DESC_F_INDIRECT) {
> +            if (desc.len % sizeof(VRingDescPacked)) {
> +                virtio_error(vdev, "Invalid size for indirect buffer table");
> +                goto err;
> +            }
> +
> +            /* If we've got too many, that implies a descriptor loop. */
> +            if (num_bufs >= max) {
> +                virtio_error(vdev, "Looped descriptor");
> +                goto err;
> +            }
> +
> +            /* loop over the indirect descriptor table */
> +            len = address_space_cache_init(&indirect_desc_cache,
> +                                           vdev->dma_as,
> +                                           desc.addr, desc.len, false);
> +            desc_cache = &indirect_desc_cache;
> +            if (len < desc.len) {
> +                virtio_error(vdev, "Cannot map indirect buffer");
> +                goto err;
> +            }
> +
> +            max = desc.len / sizeof(VRingDescPacked);
> +            num_bufs = i = 0;
> +            vring_packed_desc_read(vdev, &desc, desc_cache, i);
> +        }
> +
> +        do {
> +            /* If we've got too many, that implies a descriptor loop. */
> +            if (++num_bufs > max) {
> +                virtio_error(vdev, "Looped descriptor");
> +                goto err;
> +            }
> +
> +            if (desc.flags & VRING_DESC_F_WRITE) {
> +                in_total += desc.len;
> +            } else {
> +                out_total += desc.len;
> +            }
> +            if (in_total >= max_in_bytes && out_total >= max_out_bytes) {
> +                goto done;
> +            }
> +
> +            if (desc_cache == &indirect_desc_cache) {
> +                if (++i >= vq->vring.num) {
> +                    i -= vq->vring.num;
> +                }
> +                vring_packed_desc_read(vdev, &desc, desc_cache, i);
> +            } else {
> +                if (++idx >= vq->vring.num) {
> +                    idx -= vq->vring.num;
> +                }
> +                vring_packed_desc_read(vdev, &desc, desc_cache, idx);
> +            }

Ok, so the difference is the index, can we use a single index instead of 
two (i, idx) here?

> +            /* Make sure we see the flags */
> +            smp_mb();

Why need this?

Thanks

> +        } while (desc.flags & VRING_DESC_F_NEXT);
> +
> +        if (desc_cache == &indirect_desc_cache) {
> +            address_space_cache_destroy(&indirect_desc_cache);
> +            total_bufs++;
> +            /* We missed one step on for indirect desc */
> +            idx++;
> +        } else {
> +            total_bufs = num_bufs;
> +        }
> +
> +        desc_cache = &caches->desc;
> +        vring_packed_desc_read(vdev, &desc, desc_cache, idx % vq->vring.num);
> +    }
> +
> +done:
> +    address_space_cache_destroy(&indirect_desc_cache);
> +    if (in_bytes) {
> +        *in_bytes = in_total;
> +    }
> +    if (out_bytes) {
> +        *out_bytes = out_total;
> +    }
> +    rcu_read_unlock();
> +    return;
> +
> +err:
> +    in_total = out_total = 0;
> +    goto done;
> +}
> +
> +void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes,
> +                               unsigned int *out_bytes,
> +                               unsigned max_in_bytes, unsigned max_out_bytes)
> +{
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtqueue_packed_get_avail_bytes(vq, in_bytes, out_bytes,
> +                                         max_in_bytes, max_out_bytes);
> +    } else {
> +        virtqueue_split_get_avail_bytes(vq, in_bytes, out_bytes,
> +                                        max_in_bytes, max_out_bytes);
> +    }
> +}
> +
>   int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>                             unsigned int out_bytes)
>   {

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 5/8] virtio: queue pop for packed ring
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 5/8] virtio: queue pop " wexu
@ 2018-06-06  3:29   ` Jason Wang
  2018-06-06  3:38     ` Wei Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:29 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: jfreimann, tiwei.bie, mst



On 2018年06月06日 03:08, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 144 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index cdbb5af..0160d03 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1041,7 +1041,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
>       return elem;
>   }
>   
> -void *virtqueue_pop(VirtQueue *vq, size_t sz)
> +static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>   {
>       unsigned int i, head, max;
>       VRingMemoryRegionCaches *caches;
> @@ -1176,6 +1176,149 @@ err_undo_map:
>       goto done;
>   }
>   
> +static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> +{
> +    unsigned int i, head, max;
> +    VRingMemoryRegionCaches *caches;
> +    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +    MemoryRegionCache *cache;
> +    int64_t len;
> +    VirtIODevice *vdev = vq->vdev;
> +    VirtQueueElement *elem = NULL;
> +    unsigned out_num, in_num, elem_entries;
> +    hwaddr addr[VIRTQUEUE_MAX_SIZE];
> +    struct iovec iov[VIRTQUEUE_MAX_SIZE];
> +    VRingDescPacked desc;
> +
> +    if (unlikely(vdev->broken)) {
> +        return NULL;
> +    }
> +
> +    rcu_read_lock();
> +    if (virtio_queue_packed_empty_rcu(vq)) {
> +        goto done;
> +    }

Instead of depending on the barriers inside 
virtio_queue_packed_empty_rcu(). I think it's better to keep a smp_rmb() 
here with comments.

> +
> +    /* When we start there are none of either input nor output. */
> +    out_num = in_num = elem_entries = 0;
> +
> +    max = vq->vring.num;
> +
> +    if (vq->inuse >= vq->vring.num) {
> +        virtio_error(vdev, "Virtqueue size exceeded");
> +        goto done;
> +    }
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
> +        /* FIXME: TBD */
> +    }

This part could be removed.

> +
> +    head = vq->last_avail_idx;
> +    i = head;
> +
> +    caches = vring_get_region_caches(vq);
> +    cache = &caches->desc;
> +    vring_packed_desc_read(vdev, &desc, cache, i);

I think we'd better find a way to avoid reading descriptor twice.

Thanks

> +    if (desc.flags & VRING_DESC_F_INDIRECT) {
> +        if (desc.len % sizeof(VRingDescPacked)) {
> +            virtio_error(vdev, "Invalid size for indirect buffer table");
> +            goto done;
> +        }
> +
> +        /* loop over the indirect descriptor table */
> +        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
> +                                       desc.addr, desc.len, false);
> +        cache = &indirect_desc_cache;
> +        if (len < desc.len) {
> +            virtio_error(vdev, "Cannot map indirect buffer");
> +            goto done;
> +        }
> +
> +        max = desc.len / sizeof(VRingDescPacked);
> +        i = 0;
> +        vring_packed_desc_read(vdev, &desc, cache, i);
> +    }
> +
> +    /* Collect all the descriptors */
> +    while (1) {
> +        bool map_ok;
> +
> +        if (desc.flags & VRING_DESC_F_WRITE) {
> +            map_ok = virtqueue_map_desc(vdev, &in_num, addr + out_num,
> +                                        iov + out_num,
> +                                        VIRTQUEUE_MAX_SIZE - out_num, true,
> +                                        desc.addr, desc.len);
> +        } else {
> +            if (in_num) {
> +                virtio_error(vdev, "Incorrect order for descriptors");
> +                goto err_undo_map;
> +            }
> +            map_ok = virtqueue_map_desc(vdev, &out_num, addr, iov,
> +                                        VIRTQUEUE_MAX_SIZE, false,
> +                                        desc.addr, desc.len);
> +        }
> +        if (!map_ok) {
> +            goto err_undo_map;
> +        }
> +
> +        /* If we've got too many, that implies a descriptor loop. */
> +        if (++elem_entries > max) {
> +            virtio_error(vdev, "Looped descriptor");
> +            goto err_undo_map;
> +        }
> +
> +        if (++i >= vq->vring.num) {
> +            i -= vq->vring.num;
> +        }
> +
> +        if (desc.flags & VRING_DESC_F_NEXT) {
> +            vring_packed_desc_read(vq->vdev, &desc, cache, i);
> +        } else {
> +            break;
> +        }
> +    }
> +
> +    /* Now copy what we have collected and mapped */
> +    elem = virtqueue_alloc_element(sz, out_num, in_num);
> +    for (i = 0; i < out_num; i++) {
> +        elem->out_addr[i] = addr[i];
> +        elem->out_sg[i] = iov[i];
> +    }
> +    for (i = 0; i < in_num; i++) {
> +        elem->in_addr[i] = addr[head + out_num + i];
> +        elem->in_sg[i] = iov[out_num + i];
> +    }
> +
> +    vq->last_avail_idx += (cache == &indirect_desc_cache) ?
> +                          1 : out_num + in_num;
> +    if (vq->last_avail_idx >= vq->vring.num) {
> +        vq->last_avail_idx -= vq->vring.num;
> +        vq->avail_wrap_counter = !vq->avail_wrap_counter;
> +    }
> +    vq->inuse++;
> +
> +    trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> +done:
> +    address_space_cache_destroy(&indirect_desc_cache);
> +    rcu_read_unlock();
> +
> +    return elem;
> +
> +err_undo_map:
> +    virtqueue_undo_map_desc(out_num, in_num, iov);
> +    g_free(elem);
> +    goto done;
> +}
> +
> +void *virtqueue_pop(VirtQueue *vq, size_t sz)
> +{
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtqueue_packed_pop(vq, sz);
> +    } else {
> +        return virtqueue_split_pop(vq, sz);
> +    }
> +}
> +
>   /* virtqueue_drop_all:
>    * @vq: The #VirtQueue
>    * Drops all queued buffers and indicates them to the guest

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 5/8] virtio: queue pop for packed ring
  2018-06-06  3:29   ` Jason Wang
@ 2018-06-06  3:38     ` Wei Xu
  2018-06-06  3:41       ` Jason Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Xu @ 2018-06-06  3:38 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, jfreimann, tiwei.bie, mst

On Wed, Jun 06, 2018 at 11:29:54AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月06日 03:08, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 144 insertions(+), 1 deletion(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index cdbb5af..0160d03 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -1041,7 +1041,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_nu
> >      return elem;
> >  }
> >-void *virtqueue_pop(VirtQueue *vq, size_t sz)
> >+static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >  {
> >      unsigned int i, head, max;
> >      VRingMemoryRegionCaches *caches;
> >@@ -1176,6 +1176,149 @@ err_undo_map:
> >      goto done;
> >  }
> >+static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >+{
> >+    unsigned int i, head, max;
> >+    VRingMemoryRegionCaches *caches;
> >+    MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> >+    MemoryRegionCache *cache;
> >+    int64_t len;
> >+    VirtIODevice *vdev = vq->vdev;
> >+    VirtQueueElement *elem = NULL;
> >+    unsigned out_num, in_num, elem_entries;
> >+    hwaddr addr[VIRTQUEUE_MAX_SIZE];
> >+    struct iovec iov[VIRTQUEUE_MAX_SIZE];
> >+    VRingDescPacked desc;
> >+
> >+    if (unlikely(vdev->broken)) {
> >+        return NULL;
> >+    }
> >+
> >+    rcu_read_lock();
> >+    if (virtio_queue_packed_empty_rcu(vq)) {
> >+        goto done;
> >+    }
> 
> Instead of depending on the barriers inside virtio_queue_packed_empty_rcu().
> I think it's better to keep a smp_rmb() here with comments.

OK.

> 
> >+
> >+    /* When we start there are none of either input nor output. */
> >+    out_num = in_num = elem_entries = 0;
> >+
> >+    max = vq->vring.num;
> >+
> >+    if (vq->inuse >= vq->vring.num) {
> >+        virtio_error(vdev, "Virtqueue size exceeded");
> >+        goto done;
> >+    }
> >+
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
> >+        /* FIXME: TBD */
> >+    }
> 
> This part could be removed.

My bad, thanks.

> 
> >+
> >+    head = vq->last_avail_idx;
> >+    i = head;
> >+
> >+    caches = vring_get_region_caches(vq);
> >+    cache = &caches->desc;
> >+    vring_packed_desc_read(vdev, &desc, cache, i);
> 
> I think we'd better find a way to avoid reading descriptor twice.

Do you mean here and the read for empty check?

Wei

> 
> Thanks
> 
> >+    if (desc.flags & VRING_DESC_F_INDIRECT) {
> >+        if (desc.len % sizeof(VRingDescPacked)) {
> >+            virtio_error(vdev, "Invalid size for indirect buffer table");
> >+            goto done;
> >+        }
> >+
> >+        /* loop over the indirect descriptor table */
> >+        len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
> >+                                       desc.addr, desc.len, false);
> >+        cache = &indirect_desc_cache;
> >+        if (len < desc.len) {
> >+            virtio_error(vdev, "Cannot map indirect buffer");
> >+            goto done;
> >+        }
> >+
> >+        max = desc.len / sizeof(VRingDescPacked);
> >+        i = 0;
> >+        vring_packed_desc_read(vdev, &desc, cache, i);
> >+    }
> >+
> >+    /* Collect all the descriptors */
> >+    while (1) {
> >+        bool map_ok;
> >+
> >+        if (desc.flags & VRING_DESC_F_WRITE) {
> >+            map_ok = virtqueue_map_desc(vdev, &in_num, addr + out_num,
> >+                                        iov + out_num,
> >+                                        VIRTQUEUE_MAX_SIZE - out_num, true,
> >+                                        desc.addr, desc.len);
> >+        } else {
> >+            if (in_num) {
> >+                virtio_error(vdev, "Incorrect order for descriptors");
> >+                goto err_undo_map;
> >+            }
> >+            map_ok = virtqueue_map_desc(vdev, &out_num, addr, iov,
> >+                                        VIRTQUEUE_MAX_SIZE, false,
> >+                                        desc.addr, desc.len);
> >+        }
> >+        if (!map_ok) {
> >+            goto err_undo_map;
> >+        }
> >+
> >+        /* If we've got too many, that implies a descriptor loop. */
> >+        if (++elem_entries > max) {
> >+            virtio_error(vdev, "Looped descriptor");
> >+            goto err_undo_map;
> >+        }
> >+
> >+        if (++i >= vq->vring.num) {
> >+            i -= vq->vring.num;
> >+        }
> >+
> >+        if (desc.flags & VRING_DESC_F_NEXT) {
> >+            vring_packed_desc_read(vq->vdev, &desc, cache, i);
> >+        } else {
> >+            break;
> >+        }
> >+    }
> >+
> >+    /* Now copy what we have collected and mapped */
> >+    elem = virtqueue_alloc_element(sz, out_num, in_num);
> >+    for (i = 0; i < out_num; i++) {
> >+        elem->out_addr[i] = addr[i];
> >+        elem->out_sg[i] = iov[i];
> >+    }
> >+    for (i = 0; i < in_num; i++) {
> >+        elem->in_addr[i] = addr[head + out_num + i];
> >+        elem->in_sg[i] = iov[out_num + i];
> >+    }
> >+
> >+    vq->last_avail_idx += (cache == &indirect_desc_cache) ?
> >+                          1 : out_num + in_num;
> >+    if (vq->last_avail_idx >= vq->vring.num) {
> >+        vq->last_avail_idx -= vq->vring.num;
> >+        vq->avail_wrap_counter = !vq->avail_wrap_counter;
> >+    }
> >+    vq->inuse++;
> >+
> >+    trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >+done:
> >+    address_space_cache_destroy(&indirect_desc_cache);
> >+    rcu_read_unlock();
> >+
> >+    return elem;
> >+
> >+err_undo_map:
> >+    virtqueue_undo_map_desc(out_num, in_num, iov);
> >+    g_free(elem);
> >+    goto done;
> >+}
> >+
> >+void *virtqueue_pop(VirtQueue *vq, size_t sz)
> >+{
> >+    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >+        return virtqueue_packed_pop(vq, sz);
> >+    } else {
> >+        return virtqueue_split_pop(vq, sz);
> >+    }
> >+}
> >+
> >  /* virtqueue_drop_all:
> >   * @vq: The #VirtQueue
> >   * Drops all queued buffers and indicates them to the guest
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio: flush/push for packed ring
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 6/8] virtio: flush/push " wexu
@ 2018-06-06  3:39   ` Jason Wang
       [not found]   ` <7dc2af60-47ad-1250-fc6c-3fdf288654c3@redhat.com>
  1 sibling, 0 replies; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:39 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:08, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 96 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 0160d03..6f2da83 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -371,6 +371,21 @@ static void vring_packed_desc_read(VirtIODevice *vdev, VRingDescPacked *desc,
>       virtio_tswap16s(vdev, &desc->flags);
>   }
>   
> +static void vring_packed_desc_write(VirtIODevice *vdev, VRingDescPacked *desc,
> +                            MemoryRegionCache *cache, int i)
> +{
> +    virtio_tswap64s(vdev, &desc->addr);
> +    virtio_tswap32s(vdev, &desc->len);
> +    virtio_tswap16s(vdev, &desc->id);
> +    virtio_tswap16s(vdev, &desc->flags);
> +    address_space_write_cached(cache,
> +                               sizeof(VRingDescPacked) * i, desc,
> +                               sizeof(VRingDescPacked));
> +    address_space_cache_invalidate(cache,
> +                                   sizeof(VRingDescPacked) * i,
> +                                   sizeof(VRingDescPacked));

Is there a guarantee that flags was wrote before all other fields?

> +}
> +
>   static inline bool is_desc_avail(struct VRingDescPacked *desc)
>   {
>       return !!(desc->flags & AVAIL_DESC_PACKED(1)) !=
> @@ -526,19 +541,11 @@ bool virtqueue_rewind(VirtQueue *vq, unsigned int num)
>   }
>   
>   /* Called within rcu_read_lock().  */
> -void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> +static void virtqueue_split_fill(VirtQueue *vq, const VirtQueueElement *elem,
>                       unsigned int len, unsigned int idx)
>   {
>       VRingUsedElem uelem;
>   
> -    trace_virtqueue_fill(vq, elem, len, idx);
> -
> -    virtqueue_unmap_sg(vq, elem, len);
> -
> -    if (unlikely(vq->vdev->broken)) {
> -        return;
> -    }
> -
>       if (unlikely(!vq->vring.used)) {
>           return;
>       }
> @@ -550,16 +557,64 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
>       vring_used_write(vq, &uelem, idx);
>   }
>   
> -/* Called within rcu_read_lock().  */
> -void virtqueue_flush(VirtQueue *vq, unsigned int count)
> +static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
> +                        unsigned int len, unsigned int idx)
>   {
> -    uint16_t old, new;
> +    uint16_t w, head;
> +    VRingMemoryRegionCaches *caches;
> +    VRingDescPacked desc = {
> +        .addr = 0,
> +        .flags = 0,
> +    };
> +
> +    if (unlikely(!vq->vring.desc)) {
> +        return;
> +    }
> +
> +    caches = vring_get_region_caches(vq);
> +    head = vq->used_idx + idx;
> +    head = head >= vq->vring.num ? (head - vq->vring.num) : head;
> +    vring_packed_desc_read(vq->vdev, &desc, &caches->desc, head);
> +
> +    w = (desc.flags & AVAIL_DESC_PACKED(1)) >> 7;
> +    desc.flags &= ~(AVAIL_DESC_PACKED(1) | USED_DESC_PACKED(1));
> +    desc.flags |= AVAIL_DESC_PACKED(w) | USED_DESC_PACKED(w);

Why need read descriptor here? We should set the used counter by our own 
(e.g we have used_wrap_counter).

> +    if (!(desc.flags & VRING_DESC_F_INDIRECT)) {
> +        if (!(desc.flags & VRING_DESC_F_WRITE)) {
> +            desc.len = 0;
> +        } else {
> +            desc.len = len;
> +        }
> +    }
> +    vring_packed_desc_write(vq->vdev, &desc, &caches->desc, head);
> +
> +    /* Make sure flags has been updated */
> +    smp_mb();

This looks wrong, and if you think it's correct, you need a better 
comment to explain, e.g how it was paired with other.

Thanks

> +}
> +
> +void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> +                    unsigned int len, unsigned int idx)
> +{
> +    trace_virtqueue_fill(vq, elem, len, idx);
> +
> +    virtqueue_unmap_sg(vq, elem, len);
>   
>       if (unlikely(vq->vdev->broken)) {
> -        vq->inuse -= count;
>           return;
>       }
>   
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtqueue_packed_fill(vq, elem, len, idx);
> +    } else {
> +        virtqueue_split_fill(vq, elem, len, idx);
> +    }
> +}
> +
> +/* Called within rcu_read_lock().  */
> +static void virtqueue_split_flush(VirtQueue *vq, unsigned int count)
> +{
> +    uint16_t old, new;
> +
>       if (unlikely(!vq->vring.used)) {
>           return;
>       }
> @@ -575,6 +630,34 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>           vq->signalled_used_valid = false;
>   }
>   
> +static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
> +{
> +    if (unlikely(!vq->vring.desc)) {
> +        return;
> +    }
> +
> +    vq->inuse -= count;
> +    vq->used_idx += count;
> +    if (vq->used_idx >= vq->vring.num) {
> +        vq->used_idx -= vq->vring.num;
> +        vq->used_wrap_counter = !vq->used_wrap_counter;
> +    }
> +}
> +
> +void virtqueue_flush(VirtQueue *vq, unsigned int count)
> +{
> +    if (unlikely(vq->vdev->broken)) {
> +        vq->inuse -= count;
> +        return;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtqueue_packed_flush(vq, count);
> +    } else {
> +        virtqueue_split_flush(vq, count);
> +    }
> +}
> +
>   void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
>                       unsigned int len)
>   {

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 5/8] virtio: queue pop for packed ring
  2018-06-06  3:38     ` Wei Xu
@ 2018-06-06  3:41       ` Jason Wang
  2018-06-19  7:58         ` Wei Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:41 UTC (permalink / raw)
  To: Wei Xu; +Cc: qemu-devel, jfreimann, tiwei.bie, mst



On 2018年06月06日 11:38, Wei Xu wrote:
>>> +
>>> +    head = vq->last_avail_idx;
>>> +    i = head;
>>> +
>>> +    caches = vring_get_region_caches(vq);
>>> +    cache = &caches->desc;
>>> +    vring_packed_desc_read(vdev, &desc, cache, i);
>> I think we'd better find a way to avoid reading descriptor twice.
> Do you mean here and the read for empty check?
>
> Wei
>

Yes.

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 7/8] virtio: event suppression for packed ring
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 7/8] virtio: event suppression " wexu
@ 2018-06-06  3:46   ` Jason Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:46 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: jfreimann, tiwei.bie, mst



On 2018年06月06日 03:08, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> Signed-off-by: Wei Xu <wexu@redhat.com>

Duplicated.

> ---
>   hw/virtio/virtio.c                             | 115 +++++++++++++++++++++++--
>   include/standard-headers/linux/virtio_config.h |  13 +++
>   2 files changed, 119 insertions(+), 9 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 6f2da83..4543974 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -226,6 +226,24 @@ static void vring_desc_read(VirtIODevice *vdev, VRingDesc *desc,
>       virtio_tswap16s(vdev, &desc->next);
>   }
>   
> +static void vring_packed_event_read(VirtIODevice *vdev,
> +                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
> +{
> +    address_space_read_cached(cache, 0, e, sizeof(*e));
> +    virtio_tswap16s(vdev, &e->off_wrap);
> +    virtio_tswap16s(vdev, &e->flags);

You need to make sure flags is read before off_wrap.

> +}
> +
> +static void vring_packed_event_write(VirtIODevice *vdev,
> +                            MemoryRegionCache *cache, VRingPackedDescEvent *e)
> +{
> +    virtio_tswap16s(vdev, &e->off_wrap);
> +    virtio_tswap16s(vdev, &e->flags);
> +    address_space_write_cached(cache, 0, e, sizeof(*e));

You need make sure flags were wrote before off_wrap.

> +    address_space_cache_invalidate(cache, 0, sizeof(VRingUsedElem));
> +}
> +
> +
>   static VRingMemoryRegionCaches *vring_get_region_caches(struct VirtQueue *vq)
>   {
>       VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
> @@ -332,14 +350,8 @@ static inline void vring_set_avail_event(VirtQueue *vq, uint16_t val)
>       address_space_cache_invalidate(&caches->used, pa, sizeof(val));
>   }
>   
> -void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +static void virtio_queue_set_notification_split(VirtQueue *vq, int enable)
>   {
> -    vq->notification = enable;
> -
> -    if (!vq->vring.desc) {
> -        return;
> -    }
> -
>       rcu_read_lock();
>       if (virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
>           vring_set_avail_event(vq, vring_avail_idx(vq));
> @@ -355,6 +367,38 @@ void virtio_queue_set_notification(VirtQueue *vq, int enable)
>       rcu_read_unlock();
>   }
>   
> +static void virtio_queue_set_notification_packed(VirtQueue *vq, int enable)
> +{
> +    VRingPackedDescEvent e;
> +    VRingMemoryRegionCaches *caches;
> +
> +    rcu_read_lock();
> +    caches  = vring_get_region_caches(vq);
> +    vring_packed_event_read(vq->vdev, &caches->device, &e);

Why need read?

> +    if (enable) {
> +        e.flags = RING_EVENT_FLAGS_ENABLE;
> +    } else {
> +        e.flags = RING_EVENT_FLAGS_DISABLE;
> +    }
> +    vring_packed_event_write(vq->vdev, &caches->device, &e);
> +    rcu_read_unlock();
> +}
> +
> +void virtio_queue_set_notification(VirtQueue *vq, int enable)
> +{
> +    vq->notification = enable;
> +
> +    if (!vq->vring.desc) {
> +        return;
> +    }
> +
> +    if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +        virtio_queue_set_notification_packed(vq, enable);
> +    } else {
> +        virtio_queue_set_notification_split(vq, enable);
> +    }
> +}
> +
>   int virtio_queue_ready(VirtQueue *vq)
>   {
>       return vq->vring.avail != 0;
> @@ -2059,8 +2103,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
>       }
>   }
>   
> -/* Called within rcu_read_lock().  */
> -static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> +static bool virtio_split_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>   {
>       uint16_t old, new;
>       bool v;
> @@ -2083,6 +2126,60 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
>       return !v || vring_need_event(vring_get_used_event(vq), new, old);
>   }
>   
> +static bool vring_packed_need_event(VirtQueue *vq, uint16_t off_wrap,
> +                                    uint16_t new, uint16_t old)
> +{
> +    bool wrap = vq->used_wrap_counter;
> +    int off = off_wrap & ~(1 << 15);
> +
> +    if (new < old) {
> +        new += vq->vring.num;
> +        wrap ^= 1;
> +    }
> +
> +    if (wrap != off_wrap >> 15) {
> +        off += vq->vring.num;
> +    }
> +
> +    return vring_need_event(off, new, old);
> +}
> +
> +static bool virtio_packed_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VRingPackedDescEvent e;
> +    uint16_t old, new;
> +    bool v;
> +    VRingMemoryRegionCaches *caches;
> +
> +    caches  = vring_get_region_caches(vq);
> +    vring_packed_event_read(vdev, &caches->driver, &e);
> +
> +    /* Make sure we see the updated flags */
> +    smp_mb();

Like I mention several times, why need a memory barrier here?

> +    if (e.flags == RING_EVENT_FLAGS_DISABLE) {
> +        return false;
> +    } else if (e.flags == RING_EVENT_FLAGS_ENABLE) {
> +        return true;
> +    }
> +
> +    v = vq->signalled_used_valid;
> +    vq->signalled_used_valid = true;
> +    old = vq->signalled_used;
> +    new = vq->signalled_used = vq->used_idx;
> +
> +    return !v || vring_packed_need_event(vq, e.off_wrap, new, old);
> +}
> +
> +/* Called within rcu_read_lock().  */
> +static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return virtio_packed_should_notify(vdev, vq);
> +    } else {
> +        return virtio_split_should_notify(vdev, vq);
> +    }
> +}
> +
>   void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
>   {
>       bool should_notify;
> diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
> index 6ee5529..53e5c83 100644
> --- a/include/standard-headers/linux/virtio_config.h
> +++ b/include/standard-headers/linux/virtio_config.h
> @@ -73,4 +73,17 @@
>   #define VIRTIO_F_IOMMU_PLATFORM		33
>   
>   #define VIRTIO_F_RING_PACKED		34
> +
> +/* Enable events */
> +#define RING_EVENT_FLAGS_ENABLE 0x0
> +/* Disable events */
> +#define RING_EVENT_FLAGS_DISABLE 0x1
> +/*
> + *  * Enable events for a specific descriptor
> + *   * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + *    * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
> + *     */
> +#define RING_EVENT_FLAGS_DESC 0x2
> +/* The value 0x3 is reserved */
> +
>   #endif /* _LINUX_VIRTIO_CONFIG_H */

This could be done in a patch of header sync.

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net
  2018-06-05 19:08 ` [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net wexu
@ 2018-06-06  3:48   ` Jason Wang
  2018-06-19  7:53     ` Wei Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:48 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:08, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
>
> last_avail, avail_wrap_count, used_idx and used_wrap_count are
> needed to support vhost-net backend, all these are either 16 or
> bool variables, since state.num is 64bit wide, so here it is
> possible to put them to the 'num' without introducing a new case
> while handling ioctl.
>
> Unload/Reload test has been done successfully with a patch in vhost kernel.

You need a patch to enable vhost.

And I think you can only do it for vhost-kenrel now since vhost-user 
protocol needs some extension I believe.

>
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>   hw/virtio/virtio.c | 42 ++++++++++++++++++++++++++++++++++--------
>   1 file changed, 34 insertions(+), 8 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 4543974..153f6d7 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2862,33 +2862,59 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
>       }
>   }
>   
> -uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
> +uint64_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
>   {
> -    return vdev->vq[n].last_avail_idx;
> +    uint64_t num;
> +
> +    num = vdev->vq[n].last_avail_idx;
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        num |= ((uint64_t)vdev->vq[n].avail_wrap_counter) << 16;
> +        num |= ((uint64_t)vdev->vq[n].used_idx) << 32;
> +        num |= ((uint64_t)vdev->vq[n].used_wrap_counter) << 48;

So s.num is 32bit, I don't think this can even work.

Thanks

> +    }
> +
> +    return num;
>   }
>   
> -void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
> +void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint64_t num)
>   {
> -    vdev->vq[n].last_avail_idx = idx;
> -    vdev->vq[n].shadow_avail_idx = idx;
> +    vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx = (uint16_t)(num);
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        vdev->vq[n].avail_wrap_counter = (uint16_t)(num >> 16);
> +        vdev->vq[n].used_idx = (uint16_t)(num >> 32);
> +        vdev->vq[n].used_wrap_counter = (uint16_t)(num >> 48);
> +    }
>   }
>   
>   void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
>   {
>       rcu_read_lock();
> -    if (vdev->vq[n].vring.desc) {
> +    if (!vdev->vq[n].vring.desc) {
> +        goto out;
> +    }
> +
> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
>           vdev->vq[n].last_avail_idx = vring_used_idx(&vdev->vq[n]);
> -        vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
>       }
> +    vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
> +
> +out:
>       rcu_read_unlock();
>   }
>   
>   void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
>   {
>       rcu_read_lock();
> -    if (vdev->vq[n].vring.desc) {
> +    if (!vdev->vq[n].vring.desc) {
> +        goto out;
> +    }
> +
> +    if (!virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
>           vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
>       }
> +
> +out:
>       rcu_read_unlock();
>   }
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support
  2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
                   ` (8 preceding siblings ...)
  2018-06-06  2:34 ` [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support Jason Wang
@ 2018-06-06  3:49 ` Jason Wang
  2018-06-19  7:41   ` Wei Xu
  9 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2018-06-06  3:49 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: tiwei.bie, mst, jfreimann



On 2018年06月06日 03:07, wexu@redhat.com wrote:
> From: Wei Xu<wexu@redhat.com>
>
> Todo:
> - address Rx slow performance
> - event index interrupt suppression test

And there's something more need to test:

- vIOMMU support
- migration

Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring
  2018-06-05 19:07 ` [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring wexu
  2018-06-06  2:53   ` Jason Wang
@ 2018-06-13 12:27   ` Paolo Bonzini
  1 sibling, 0 replies; 27+ messages in thread
From: Paolo Bonzini @ 2018-06-13 12:27 UTC (permalink / raw)
  To: wexu, qemu-devel; +Cc: jasowang, jfreimann, tiwei.bie, mst

On 05/06/2018 21:07, wexu@redhat.com wrote:
> From: Wei Xu <wexu@redhat.com>
> 
> Mostly reuse memory cache with 1.0 except for the offset calculation.
> 
> Signed-off-by: Wei Xu <wexu@redhat.com>
> ---
>  hw/virtio/virtio.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e192a9a..f6c0689 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -150,11 +150,8 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
>      VRingMemoryRegionCaches *old = vq->vring.caches;
>      VRingMemoryRegionCaches *new;
>      hwaddr addr, size;
> -    int event_size;
>      int64_t len;
>  
> -    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> -
>      addr = vq->vring.desc;
>      if (!addr) {
>          return;
> @@ -168,7 +165,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
>          goto err_desc;
>      }
>  
> -    size = virtio_queue_get_used_size(vdev, n) + event_size;
> +    size = virtio_queue_get_used_size(vdev, n);
>      len = address_space_cache_init(&new->used, vdev->dma_as,
>                                     vq->vring.used, size, true);
>      if (len < size) {
> @@ -176,7 +173,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
>          goto err_used;
>      }
>  
> -    size = virtio_queue_get_avail_size(vdev, n) + event_size;
> +    size = virtio_queue_get_avail_size(vdev, n);
>      len = address_space_cache_init(&new->avail, vdev->dma_as,
>                                     vq->vring.avail, size, false);
>      if (len < size) {
> @@ -2320,14 +2317,28 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
>  
>  hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
>  {
> -    return offsetof(VRingAvail, ring) +
> -        sizeof(uint16_t) * vdev->vq[n].vring.num;
> +    int s;
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return sizeof(struct VRingPackedDescEvent);
> +    } else {
> +        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> +        return offsetof(VRingAvail, ring) +
> +            sizeof(uint16_t) * vdev->vq[n].vring.num + s;
> +    }
>  }
>  
>  hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
>  {
> -    return offsetof(VRingUsed, ring) +
> -        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
> +    int s;
> +
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +        return sizeof(struct VRingPackedDescEvent);
> +    } else {
> +        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> +        return offsetof(VRingUsed, ring) +
> +            sizeof(VRingUsedElem) * vdev->vq[n].vring.num + s;
> +    }
>  }
>  
>  uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring
  2018-06-06  2:53   ` Jason Wang
@ 2018-06-19  7:39     ` Wei Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Xu @ 2018-06-19  7:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, tiwei.bie, mst, jfreimann

On Wed, Jun 06, 2018 at 10:53:07AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月06日 03:07, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >Mostly reuse memory cache with 1.0 except for the offset calculation.
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 29 ++++++++++++++++++++---------
> >  1 file changed, 20 insertions(+), 9 deletions(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index e192a9a..f6c0689 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -150,11 +150,8 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
> >      VRingMemoryRegionCaches *old = vq->vring.caches;
> >      VRingMemoryRegionCaches *new;
> >      hwaddr addr, size;
> >-    int event_size;
> >      int64_t len;
> >-    event_size = virtio_vdev_has_feature(vq->vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> >-
> >      addr = vq->vring.desc;
> >      if (!addr) {
> >          return;
> >@@ -168,7 +165,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
> >          goto err_desc;
> >      }
> >-    size = virtio_queue_get_used_size(vdev, n) + event_size;
> >+    size = virtio_queue_get_used_size(vdev, n);
> >      len = address_space_cache_init(&new->used, vdev->dma_as,
> >                                     vq->vring.used, size, true);
> >      if (len < size) {
> >@@ -176,7 +173,7 @@ static void virtio_init_region_cache(VirtIODevice *vdev, int n)
> >          goto err_used;
> >      }
> >-    size = virtio_queue_get_avail_size(vdev, n) + event_size;
> >+    size = virtio_queue_get_avail_size(vdev, n);
> >      len = address_space_cache_init(&new->avail, vdev->dma_as,
> >                                     vq->vring.avail, size, false);
> >      if (len < size) {
> >@@ -2320,14 +2317,28 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
> >  hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
> 
> I would rather rename this to virtio_queue_get_driver_size().

Would this confuse 1.0 if it is shared by both? Otherwise I will take it to next version, thanks.

Wei

> 
> >  {
> >-    return offsetof(VRingAvail, ring) +
> >-        sizeof(uint16_t) * vdev->vq[n].vring.num;
> >+    int s;
> >+
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        return sizeof(struct VRingPackedDescEvent);
> >+    } else {
> >+        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> >+        return offsetof(VRingAvail, ring) +
> >+            sizeof(uint16_t) * vdev->vq[n].vring.num + s;
> >+    }
> >  }
> >  hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
> 
> virtio_queue_get_device_size().
> 
> Thanks
> 
> >  {
> >-    return offsetof(VRingUsed, ring) +
> >-        sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
> >+    int s;
> >+
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        return sizeof(struct VRingPackedDescEvent);
> >+    } else {
> >+        s = virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> >+        return offsetof(VRingUsed, ring) +
> >+            sizeof(VRingUsedElem) * vdev->vq[n].vring.num + s;
> >+    }
> >  }
> >  uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support
  2018-06-06  3:49 ` Jason Wang
@ 2018-06-19  7:41   ` Wei Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Xu @ 2018-06-19  7:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, tiwei.bie, mst, jfreimann

On Wed, Jun 06, 2018 at 11:49:22AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月06日 03:07, wexu@redhat.com wrote:
> >From: Wei Xu<wexu@redhat.com>
> >
> >Todo:
> >- address Rx slow performance
> >- event index interrupt suppression test
> 
> And there's something more need to test:
> 
> - vIOMMU support
> - migration

OK, I will test it and put it to next version.

> 
> Thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net
  2018-06-06  3:48   ` Jason Wang
@ 2018-06-19  7:53     ` Wei Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Xu @ 2018-06-19  7:53 UTC (permalink / raw)
  To: Jason Wang; +Cc: qemu-devel, tiwei.bie, mst, jfreimann

On Wed, Jun 06, 2018 at 11:48:19AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月06日 03:08, wexu@redhat.com wrote:
> >From: Wei Xu <wexu@redhat.com>
> >
> >last_avail, avail_wrap_count, used_idx and used_wrap_count are
> >needed to support vhost-net backend, all these are either 16 or
> >bool variables, since state.num is 64bit wide, so here it is
> >possible to put them to the 'num' without introducing a new case
> >while handling ioctl.
> >
> >Unload/Reload test has been done successfully with a patch in vhost kernel.
> 
> You need a patch to enable vhost.
> 
> And I think you can only do it for vhost-kenrel now since vhost-user
> protocol needs some extension I believe.

OK.

> 
> >
> >Signed-off-by: Wei Xu <wexu@redhat.com>
> >---
> >  hw/virtio/virtio.c | 42 ++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 34 insertions(+), 8 deletions(-)
> >
> >diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >index 4543974..153f6d7 100644
> >--- a/hw/virtio/virtio.c
> >+++ b/hw/virtio/virtio.c
> >@@ -2862,33 +2862,59 @@ hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n)
> >      }
> >  }
> >-uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
> >+uint64_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
> >  {
> >-    return vdev->vq[n].last_avail_idx;
> >+    uint64_t num;
> >+
> >+    num = vdev->vq[n].last_avail_idx;
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        num |= ((uint64_t)vdev->vq[n].avail_wrap_counter) << 16;
> >+        num |= ((uint64_t)vdev->vq[n].used_idx) << 32;
> >+        num |= ((uint64_t)vdev->vq[n].used_wrap_counter) << 48;
> 
> So s.num is 32bit, I don't think this can even work.

I mistakenly checked out s.num is 64bit, will add a new case in next version.

Wei

> 
> Thanks
> 
> >+    }
> >+
> >+    return num;
> >  }
> >-void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
> >+void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint64_t num)
> >  {
> >-    vdev->vq[n].last_avail_idx = idx;
> >-    vdev->vq[n].shadow_avail_idx = idx;
> >+    vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx = (uint16_t)(num);
> >+
> >+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >+        vdev->vq[n].avail_wrap_counter = (uint16_t)(num >> 16);
> >+        vdev->vq[n].used_idx = (uint16_t)(num >> 32);
> >+        vdev->vq[n].used_wrap_counter = (uint16_t)(num >> 48);
> >+    }
> >  }
> >  void virtio_queue_restore_last_avail_idx(VirtIODevice *vdev, int n)
> >  {
> >      rcu_read_lock();
> >-    if (vdev->vq[n].vring.desc) {
> >+    if (!vdev->vq[n].vring.desc) {
> >+        goto out;
> >+    }
> >+
> >+    if (!virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >          vdev->vq[n].last_avail_idx = vring_used_idx(&vdev->vq[n]);
> >-        vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
> >      }
> >+    vdev->vq[n].shadow_avail_idx = vdev->vq[n].last_avail_idx;
> >+
> >+out:
> >      rcu_read_unlock();
> >  }
> >  void virtio_queue_update_used_idx(VirtIODevice *vdev, int n)
> >  {
> >      rcu_read_lock();
> >-    if (vdev->vq[n].vring.desc) {
> >+    if (!vdev->vq[n].vring.desc) {
> >+        goto out;
> >+    }
> >+
> >+    if (!virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >          vdev->vq[n].used_idx = vring_used_idx(&vdev->vq[n]);
> >      }
> >+
> >+out:
> >      rcu_read_unlock();
> >  }
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 5/8] virtio: queue pop for packed ring
  2018-06-06  3:41       ` Jason Wang
@ 2018-06-19  7:58         ` Wei Xu
  0 siblings, 0 replies; 27+ messages in thread
From: Wei Xu @ 2018-06-19  7:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: jfreimann, qemu-devel, tiwei.bie, mst

On Wed, Jun 06, 2018 at 11:41:18AM +0800, Jason Wang wrote:
> 
> 
> On 2018年06月06日 11:38, Wei Xu wrote:
> >>>+
> >>>+    head = vq->last_avail_idx;
> >>>+    i = head;
> >>>+
> >>>+    caches = vring_get_region_caches(vq);
> >>>+    cache = &caches->desc;
> >>>+    vring_packed_desc_read(vdev, &desc, cache, i);
> >>I think we'd better find a way to avoid reading descriptor twice.
> >Do you mean here and the read for empty check?
> >
> >Wei
> >
> 
> Yes.

OK, will figure it out.

> 
> Thanks
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [RFC v2 6/8] virtio: flush/push for packed ring
       [not found]   ` <7dc2af60-47ad-1250-fc6c-3fdf288654c3@redhat.com>
@ 2018-11-28 17:33     ` Maxime Coquelin
  0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2018-11-28 17:33 UTC (permalink / raw)
  To: wexu, qemu-devel, jfreimann; +Cc: jasowang, tiwei.bie, mst

Hi Wei,

On 9/20/18 4:13 PM, Maxime Coquelin wrote:
>>       }
>> @@ -575,6 +630,34 @@ void virtqueue_flush(VirtQueue *vq, unsigned int 
>> count)
>>           vq->signalled_used_valid = false;
>>   }
>> +static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
>> +{
>> +    if (unlikely(!vq->vring.desc)) {
>> +        return;
>> +    }
>> +
>> +    vq->inuse -= count;
>> +    vq->used_idx += count;
> 
> I think it is wrong, because count seems to be representing the number
> of descriptor chains.
> 
> But in case of packed ring layout, used index must be incremented by the
> number of descriptors.
> 
> For example, I'm having trouble with the ctrl queue, where the commands
> sent by the guest use 3 descriptors, but count is 1.

It seems you didn't fixed this issue in your last revision.
I guess it works for virtio-net Kernel driver (with the fix you
proposed) in guest because it is using indirect descs.

But Virtio PMD does not use indirect descs for the control vq.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-11-28 17:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-05 19:07 [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support wexu
2018-06-05 19:07 ` [Qemu-devel] [RFC v2 1/8] virtio: feature bit, data structure, init for 1.1 wexu
2018-06-06  2:49   ` Jason Wang
2018-06-05 19:07 ` [Qemu-devel] [RFC v2 2/8] virtio: memory cache for packed ring wexu
2018-06-06  2:53   ` Jason Wang
2018-06-19  7:39     ` Wei Xu
2018-06-13 12:27   ` Paolo Bonzini
2018-06-05 19:07 ` [Qemu-devel] [RFC v2 3/8] virtio: empty check and desc read " wexu
2018-06-06  3:09   ` Jason Wang
2018-06-05 19:07 ` [Qemu-devel] [RFC v2 4/8] virtio: get avail bytes check " wexu
2018-06-06  3:19   ` Jason Wang
2018-06-05 19:08 ` [Qemu-devel] [RFC v2 5/8] virtio: queue pop " wexu
2018-06-06  3:29   ` Jason Wang
2018-06-06  3:38     ` Wei Xu
2018-06-06  3:41       ` Jason Wang
2018-06-19  7:58         ` Wei Xu
2018-06-05 19:08 ` [Qemu-devel] [RFC v2 6/8] virtio: flush/push " wexu
2018-06-06  3:39   ` Jason Wang
     [not found]   ` <7dc2af60-47ad-1250-fc6c-3fdf288654c3@redhat.com>
2018-11-28 17:33     ` Maxime Coquelin
2018-06-05 19:08 ` [Qemu-devel] [RFC v2 7/8] virtio: event suppression " wexu
2018-06-06  3:46   ` Jason Wang
2018-06-05 19:08 ` [Qemu-devel] [RFC v2 8/8] virtio: guest driver reload for vhost-net wexu
2018-06-06  3:48   ` Jason Wang
2018-06-19  7:53     ` Wei Xu
2018-06-06  2:34 ` [Qemu-devel] [RFC v2 0/8] packed ring virtio-net userspace backend support Jason Wang
2018-06-06  3:49 ` Jason Wang
2018-06-19  7:41   ` Wei Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).