* [PATCH 2/6] tools/virtio: fix compile.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
@ 2013-01-17 10:29 ` Rusty Russell
2013-01-17 10:29 ` [PATCH 3/6] tools/virtio: separate headers more Rusty Russell
` (6 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-17 10:29 UTC (permalink / raw)
To: virtualization; +Cc: Michael S . Tsirkin
Didn't test that it actually works, mind you!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
tools/virtio/linux/virtio.h | 9 ++++++++-
tools/virtio/virtio_test.c | 6 +++---
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 81847dd..ccfc901 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -86,6 +86,10 @@ typedef enum {
GFP_KERNEL,
GFP_ATOMIC,
} gfp_t;
+
+#define __GFP_HIGHMEM 0
+#define __GFP_HIGH 0
+
typedef enum {
IRQ_NONE,
IRQ_HANDLED
@@ -163,6 +167,8 @@ struct virtqueue {
void (*callback)(struct virtqueue *vq);
const char *name;
struct virtio_device *vdev;
+ unsigned int index;
+ unsigned int num_free;
void *priv;
};
@@ -206,7 +212,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
void *virtqueue_detach_unused_buf(struct virtqueue *vq);
-struct virtqueue *vring_new_virtqueue(unsigned int num,
+struct virtqueue *vring_new_virtqueue(unsigned int index,
+ unsigned int num,
unsigned int vring_align,
struct virtio_device *vdev,
bool weak_barriers,
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index fcc9aa2..a07142c 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -81,7 +81,7 @@ void vhost_vq_setup(struct vdev_info *dev, struct vq_info *info)
assert(r >= 0);
}
-static void vq_info_add(struct vdev_info *dev, int num)
+static void vq_info_add(struct vdev_info *dev, int idx, int num)
{
struct vq_info *info = &dev->vqs[dev->nvqs];
int r;
@@ -92,7 +92,7 @@ static void vq_info_add(struct vdev_info *dev, int num)
assert(r >= 0);
memset(info->ring, 0, vring_size(num, 4096));
vring_init(&info->vring, num, info->ring, 4096);
- info->vq = vring_new_virtqueue(info->vring.num, 4096, &dev->vdev,
+ info->vq = vring_new_virtqueue(idx, info->vring.num, 4096, &dev->vdev,
true, info->ring,
vq_notify, vq_callback, "test");
assert(info->vq);
@@ -277,7 +277,7 @@ int main(int argc, char **argv)
done:
vdev_info_init(&dev, features);
- vq_info_add(&dev, 256);
+ vq_info_add(&dev, 0, 256);
run_test(&dev, &dev.vqs[0], delayed, 0x100000);
return 0;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH 3/6] tools/virtio: separate headers more.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
2013-01-17 10:29 ` [PATCH 2/6] tools/virtio: fix compile Rusty Russell
@ 2013-01-17 10:29 ` Rusty Russell
2013-01-17 10:29 ` [PATCH 4/6] tools/virtio: add vring_test Rusty Russell
` (5 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-17 10:29 UTC (permalink / raw)
To: virtualization; +Cc: Michael S . Tsirkin
This makes them a bit more like the kernel headers, so we can include more
real kernel headers in our tests.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
tools/virtio/asm/barrier.h | 14 +++
tools/virtio/linux/bug.h | 8 ++
tools/virtio/linux/err.h | 26 +++++
tools/virtio/linux/irqreturn.h | 1 +
tools/virtio/linux/kernel.h | 112 ++++++++++++++++++++
tools/virtio/linux/printk.h | 4 +
tools/virtio/linux/ratelimit.h | 4 +
tools/virtio/linux/scatterlist.h | 173 +++++++++++++++++++++++++++++++
tools/virtio/linux/types.h | 28 +++++
tools/virtio/linux/uaccess.h | 37 +++++++
tools/virtio/linux/virtio.h | 149 +-------------------------
tools/virtio/linux/virtio_config.h | 6 ++
tools/virtio/linux/virtio_ring.h | 1 +
tools/virtio/linux/vringh.h | 1 +
tools/virtio/uapi/linux/virtio_config.h | 1 +
tools/virtio/uapi/linux/virtio_ring.h | 4 +
tools/virtio/virtio_test.c | 4 +
17 files changed, 427 insertions(+), 146 deletions(-)
create mode 100644 tools/virtio/asm/barrier.h
create mode 100644 tools/virtio/linux/bug.h
create mode 100644 tools/virtio/linux/err.h
create mode 100644 tools/virtio/linux/irqreturn.h
create mode 100644 tools/virtio/linux/kernel.h
create mode 100644 tools/virtio/linux/printk.h
create mode 100644 tools/virtio/linux/ratelimit.h
create mode 100644 tools/virtio/linux/scatterlist.h
create mode 100644 tools/virtio/linux/types.h
create mode 100644 tools/virtio/linux/uaccess.h
create mode 100644 tools/virtio/linux/virtio_config.h
create mode 100644 tools/virtio/linux/virtio_ring.h
create mode 100644 tools/virtio/linux/vringh.h
create mode 100644 tools/virtio/uapi/linux/virtio_config.h
create mode 100644 tools/virtio/uapi/linux/virtio_ring.h
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
new file mode 100644
index 0000000..aff61e1
--- /dev/null
+++ b/tools/virtio/asm/barrier.h
@@ -0,0 +1,14 @@
+#if defined(__i386__) || defined(__x86_64__)
+#define barrier() asm volatile("" ::: "memory")
+#define mb() __sync_synchronize()
+
+#define smp_mb() mb()
+# define smp_rmb() barrier()
+# define smp_wmb() barrier()
+/* Weak barriers should be used. If not - it's a bug */
+# define rmb() abort()
+# define wmb() abort()
+#else
+#error Please fill in barrier macros
+#endif
+
diff --git a/tools/virtio/linux/bug.h b/tools/virtio/linux/bug.h
new file mode 100644
index 0000000..4e0e359
--- /dev/null
+++ b/tools/virtio/linux/bug.h
@@ -0,0 +1,8 @@
+#ifndef BUG_H
+#define BUG_H
+
+#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
+
+#define BUILD_BUG_ON(x)
+
+#endif /* BUG_H */
diff --git a/tools/virtio/linux/err.h b/tools/virtio/linux/err.h
new file mode 100644
index 0000000..e32eff8
--- /dev/null
+++ b/tools/virtio/linux/err.h
@@ -0,0 +1,26 @@
+#ifndef ERR_H
+#define ERR_H
+#define MAX_ERRNO 4095
+
+#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error)
+{
+ return (void *) error;
+}
+
+static inline long __must_check PTR_ERR(const void *ptr)
+{
+ return (long) ptr;
+}
+
+static inline long __must_check IS_ERR(const void *ptr)
+{
+ return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline long __must_check IS_ERR_OR_NULL(const void *ptr)
+{
+ return !ptr || IS_ERR_VALUE((unsigned long)ptr);
+}
+#endif /* ERR_H */
diff --git a/tools/virtio/linux/irqreturn.h b/tools/virtio/linux/irqreturn.h
new file mode 100644
index 0000000..a3c4e7b
--- /dev/null
+++ b/tools/virtio/linux/irqreturn.h
@@ -0,0 +1 @@
+#include "../../../include/linux/irqreturn.h"
diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
new file mode 100644
index 0000000..b64a4c1
--- /dev/null
+++ b/tools/virtio/linux/kernel.h
@@ -0,0 +1,112 @@
+#ifndef KERNEL_H
+#define KERNEL_H
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <stdarg.h>
+
+#include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bug.h>
+#include <errno.h>
+#include <unistd.h>
+#include <asm/barrier.h>
+
+#define CONFIG_SMP
+
+#define PAGE_SIZE getpagesize()
+#define PAGE_MASK (~(PAGE_SIZE-1))
+
+typedef unsigned long long dma_addr_t;
+
+struct page {
+ unsigned long long dummy;
+};
+
+/* Physical == Virtual */
+#define virt_to_phys(p) ((unsigned long)p)
+#define phys_to_virt(a) ((void *)(unsigned long)(a))
+/* Page address: Virtual / 4K */
+#define page_to_phys(p) ((dma_addr_t)(unsigned long)(p))
+#define virt_to_page(p) ((struct page *)((unsigned long)p & PAGE_MASK))
+
+#define offset_in_page(p) (((unsigned long)p) % PAGE_SIZE)
+
+#define __printf(a,b) __attribute__((format(printf,a,b)))
+
+typedef enum {
+ GFP_KERNEL,
+ GFP_ATOMIC,
+} gfp_t;
+
+#define __GFP_HIGHMEM 0
+#define __GFP_HIGH 0
+
+#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+
+extern void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static inline void *kmalloc(size_t s, gfp_t gfp)
+{
+ if (__kmalloc_fake)
+ return __kmalloc_fake;
+ return malloc(s);
+}
+
+static inline void kfree(void *p)
+{
+ if (p >= __kfree_ignore_start && p < __kfree_ignore_end)
+ return;
+ free(p);
+}
+
+static inline void *krealloc(void *p, size_t s, gfp_t gfp)
+{
+ return realloc(p, s);
+}
+
+
+static inline unsigned long __get_free_page(gfp_t gfp)
+{
+ void *p;
+
+ posix_memalign(&p, PAGE_SIZE, PAGE_SIZE);
+ return (unsigned long)p;
+}
+
+static inline void free_page(unsigned long addr)
+{
+ free((void *)addr);
+}
+
+#define container_of(ptr, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+ (type *)( (char *)__mptr - offsetof(type,member) );})
+
+#define uninitialized_var(x) x = x
+
+# ifndef likely
+# define likely(x) (__builtin_expect(!!(x), 1))
+# endif
+# ifndef unlikely
+# define unlikely(x) (__builtin_expect(!!(x), 0))
+# endif
+
+#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
+#endif /* KERNEL_H */
diff --git a/tools/virtio/linux/printk.h b/tools/virtio/linux/printk.h
new file mode 100644
index 0000000..9f2423b
--- /dev/null
+++ b/tools/virtio/linux/printk.h
@@ -0,0 +1,4 @@
+#include "../../../include/linux/kern_levels.h"
+
+#define printk printf
+#define vprintk vprintf
diff --git a/tools/virtio/linux/ratelimit.h b/tools/virtio/linux/ratelimit.h
new file mode 100644
index 0000000..dcce172
--- /dev/null
+++ b/tools/virtio/linux/ratelimit.h
@@ -0,0 +1,4 @@
+#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) int name = 0
+
+#define __ratelimit(x) (*(x))
+
diff --git a/tools/virtio/linux/scatterlist.h b/tools/virtio/linux/scatterlist.h
new file mode 100644
index 0000000..b2cf7d0
--- /dev/null
+++ b/tools/virtio/linux/scatterlist.h
@@ -0,0 +1,173 @@
+#ifndef SCATTERLIST_H
+#define SCATTERLIST_H
+#include <linux/kernel.h>
+
+struct scatterlist {
+ unsigned long page_link;
+ unsigned int offset;
+ unsigned int length;
+ dma_addr_t dma_address;
+};
+
+/* Scatterlist helpers, stolen from linux/scatterlist.h */
+#define sg_is_chain(sg) ((sg)->page_link & 0x01)
+#define sg_is_last(sg) ((sg)->page_link & 0x02)
+#define sg_chain_ptr(sg) \
+ ((struct scatterlist *) ((sg)->page_link & ~0x03))
+
+/**
+ * sg_assign_page - Assign a given page to an SG entry
+ * @sg: SG entry
+ * @page: The page
+ *
+ * Description:
+ * Assign page to sg entry. Also see sg_set_page(), the most commonly used
+ * variant.
+ *
+ **/
+static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
+{
+ unsigned long page_link = sg->page_link & 0x3;
+
+ /*
+ * In order for the low bit stealing approach to work, pages
+ * must be aligned at a 32-bit boundary as a minimum.
+ */
+ BUG_ON((unsigned long) page & 0x03);
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+ BUG_ON(sg_is_chain(sg));
+#endif
+ sg->page_link = page_link | (unsigned long) page;
+}
+
+/**
+ * sg_set_page - Set sg entry to point at given page
+ * @sg: SG entry
+ * @page: The page
+ * @len: Length of data
+ * @offset: Offset into page
+ *
+ * Description:
+ * Use this function to set an sg entry pointing at a page, never assign
+ * the page directly. We encode sg table information in the lower bits
+ * of the page pointer. See sg_page() for looking up the page belonging
+ * to an sg entry.
+ *
+ **/
+static inline void sg_set_page(struct scatterlist *sg, struct page *page,
+ unsigned int len, unsigned int offset)
+{
+ sg_assign_page(sg, page);
+ sg->offset = offset;
+ sg->length = len;
+}
+
+static inline struct page *sg_page(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+ BUG_ON(sg_is_chain(sg));
+#endif
+ return (struct page *)((sg)->page_link & ~0x3);
+}
+
+/*
+ * Loop over each sg element, following the pointer to a new list if necessary
+ */
+#define for_each_sg(sglist, sg, nr, __i) \
+ for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg))
+
+/**
+ * sg_chain - Chain two sglists together
+ * @prv: First scatterlist
+ * @prv_nents: Number of entries in prv
+ * @sgl: Second scatterlist
+ *
+ * Description:
+ * Links @prv@ and @sgl@ together, to form a longer scatterlist.
+ *
+ **/
+static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents,
+ struct scatterlist *sgl)
+{
+ /*
+ * offset and length are unused for chain entry. Clear them.
+ */
+ prv[prv_nents - 1].offset = 0;
+ prv[prv_nents - 1].length = 0;
+
+ /*
+ * Set lowest bit to indicate a link pointer, and make sure to clear
+ * the termination bit if it happens to be set.
+ */
+ prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02;
+}
+
+/**
+ * sg_mark_end - Mark the end of the scatterlist
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ * Marks the passed in sg entry as the termination point for the sg
+ * table. A call to sg_next() on this entry will return NULL.
+ *
+ **/
+static inline void sg_mark_end(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ /*
+ * Set termination bit, clear potential chain bit
+ */
+ sg->page_link |= 0x02;
+ sg->page_link &= ~0x01;
+}
+
+static inline struct scatterlist *sg_next(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ if (sg_is_last(sg))
+ return NULL;
+
+ sg++;
+ if (unlikely(sg_is_chain(sg)))
+ sg = sg_chain_ptr(sg);
+
+ return sg;
+}
+
+static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
+{
+ memset(sgl, 0, sizeof(*sgl) * nents);
+#ifdef CONFIG_DEBUG_SG
+ {
+ unsigned int i;
+ for (i = 0; i < nents; i++)
+ sgl[i].sg_magic = SG_MAGIC;
+ }
+#endif
+ sg_mark_end(&sgl[nents - 1]);
+}
+
+static inline dma_addr_t sg_phys(struct scatterlist *sg)
+{
+ return page_to_phys(sg_page(sg)) + sg->offset;
+}
+
+static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
+ unsigned int buflen)
+{
+ sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
+}
+
+static inline void sg_init_one(struct scatterlist *sg,
+ const void *buf, unsigned int buflen)
+{
+ sg_init_table(sg, 1);
+ sg_set_buf(sg, buf, buflen);
+}
+#endif /* SCATTERLIST_H */
diff --git a/tools/virtio/linux/types.h b/tools/virtio/linux/types.h
new file mode 100644
index 0000000..f8ebb9a
--- /dev/null
+++ b/tools/virtio/linux/types.h
@@ -0,0 +1,28 @@
+#ifndef TYPES_H
+#define TYPES_H
+#include <stdint.h>
+
+#define __force
+#define __user
+#define __must_check
+#define __cold
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef uint64_t __u64;
+typedef int64_t __s64;
+typedef uint32_t __u32;
+typedef int32_t __s32;
+typedef uint16_t __u16;
+typedef int16_t __s16;
+typedef uint8_t __u8;
+typedef int8_t __s8;
+
+#endif /* TYPES_H */
diff --git a/tools/virtio/linux/uaccess.h b/tools/virtio/linux/uaccess.h
new file mode 100644
index 0000000..5874786
--- /dev/null
+++ b/tools/virtio/linux/uaccess.h
@@ -0,0 +1,37 @@
+#ifndef UACCESS_H
+#define UACCESS_H
+extern void *__user_addr_min, *__user_addr_max;
+
+#define get_user(dst, uptr) get_user_func(&(dst), (uptr), sizeof(*uptr))
+#define put_user(val, uptr) put_user_func(&(val), (uptr), sizeof(*uptr))
+
+static inline int get_user_func(void *dst, void *src, size_t size)
+{
+ assert(src >= __user_addr_min && src + size <= __user_addr_max);
+ memcpy(dst, src, size);
+ return 0;
+}
+
+static inline int put_user_func(void *val, void *dst, size_t size)
+{
+ assert(dst >= __user_addr_min && dst + size <= __user_addr_max);
+ memcpy(dst, val, size);
+ return 0;
+}
+
+static inline int copy_from_user(void *to, const void __user *from,
+ unsigned long n)
+{
+ assert(from >= __user_addr_min && from + n <= __user_addr_max);
+ memcpy(to, from, n);
+ return 0;
+}
+
+static inline int copy_to_user(void __user *to, const void *from,
+ unsigned long n)
+{
+ assert(to >= __user_addr_min && to + n <= __user_addr_max);
+ memcpy(to, from, n);
+ return 0;
+}
+#endif /* UACCESS_H */
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index ccfc901..4df92b2 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -1,131 +1,7 @@
#ifndef LINUX_VIRTIO_H
#define LINUX_VIRTIO_H
-
-#include <stdbool.h>
-#include <stdlib.h>
-#include <stddef.h>
-#include <stdio.h>
-#include <string.h>
-#include <assert.h>
-
-#include <linux/types.h>
-#include <errno.h>
-
-typedef unsigned long long dma_addr_t;
-
-struct scatterlist {
- unsigned long page_link;
- unsigned int offset;
- unsigned int length;
- dma_addr_t dma_address;
-};
-
-struct page {
- unsigned long long dummy;
-};
-
-#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
-
-/* Physical == Virtual */
-#define virt_to_phys(p) ((unsigned long)p)
-#define phys_to_virt(a) ((void *)(unsigned long)(a))
-/* Page address: Virtual / 4K */
-#define virt_to_page(p) ((struct page*)((virt_to_phys(p) / 4096) * \
- sizeof(struct page)))
-#define offset_in_page(p) (((unsigned long)p) % 4096)
-#define sg_phys(sg) ((sg->page_link & ~0x3) / sizeof(struct page) * 4096 + \
- sg->offset)
-static inline void sg_mark_end(struct scatterlist *sg)
-{
- /*
- * Set termination bit, clear potential chain bit
- */
- sg->page_link |= 0x02;
- sg->page_link &= ~0x01;
-}
-static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
-{
- memset(sgl, 0, sizeof(*sgl) * nents);
- sg_mark_end(&sgl[nents - 1]);
-}
-static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
-{
- unsigned long page_link = sg->page_link & 0x3;
-
- /*
- * In order for the low bit stealing approach to work, pages
- * must be aligned at a 32-bit boundary as a minimum.
- */
- BUG_ON((unsigned long) page & 0x03);
- sg->page_link = page_link | (unsigned long) page;
-}
-
-static inline void sg_set_page(struct scatterlist *sg, struct page *page,
- unsigned int len, unsigned int offset)
-{
- sg_assign_page(sg, page);
- sg->offset = offset;
- sg->length = len;
-}
-
-static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
- unsigned int buflen)
-{
- sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
-}
-
-static inline void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen)
-{
- sg_init_table(sg, 1);
- sg_set_buf(sg, buf, buflen);
-}
-
-typedef __u16 u16;
-
-typedef enum {
- GFP_KERNEL,
- GFP_ATOMIC,
-} gfp_t;
-
-#define __GFP_HIGHMEM 0
-#define __GFP_HIGH 0
-
-typedef enum {
- IRQ_NONE,
- IRQ_HANDLED
-} irqreturn_t;
-
-static inline void *kmalloc(size_t s, gfp_t gfp)
-{
- return malloc(s);
-}
-
-static inline void kfree(void *p)
-{
- free(p);
-}
-
-#define container_of(ptr, type, member) ({ \
- const typeof( ((type *)0)->member ) *__mptr = (ptr); \
- (type *)( (char *)__mptr - offsetof(type,member) );})
-
-#define uninitialized_var(x) x = x
-
-# ifndef likely
-# define likely(x) (__builtin_expect(!!(x), 1))
-# endif
-# ifndef unlikely
-# define unlikely(x) (__builtin_expect(!!(x), 0))
-# endif
-
-#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#ifdef DEBUG
-#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#else
-#define pr_debug(format, ...) do {} while (0)
-#endif
-#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
/* TODO: empty stubs for now. Broken but enough for virtio_ring.c */
#define list_add_tail(a, b) do {} while (0)
@@ -135,6 +11,7 @@ static inline void kfree(void *p)
#define BITS_PER_BYTE 8
#define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE)
#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
+
/* TODO: Not atomic as it should be:
* we don't use this for anything important. */
static inline void clear_bit(int nr, volatile unsigned long *addr)
@@ -149,10 +26,6 @@ static inline int test_bit(int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
-
-/* The only feature we care to support */
-#define virtio_has_feature(dev, feature) \
- test_bit((feature), (dev)->features)
/* end of stubs */
struct virtio_device {
@@ -178,22 +51,6 @@ struct virtqueue {
#define MODULE_LICENSE(__MODULE_LICENSE_value) \
const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value
-#define CONFIG_SMP
-
-#if defined(__i386__) || defined(__x86_64__)
-#define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
-
-#define smp_mb() mb()
-# define smp_rmb() barrier()
-# define smp_wmb() barrier()
-/* Weak barriers should be used. If not - it's a bug */
-# define rmb() abort()
-# define wmb() abort()
-#else
-#error Please fill in barrier macros
-#endif
-
/* Interfaces exported by virtio_ring. */
int virtqueue_add_buf(struct virtqueue *vq,
struct scatterlist sg[],
diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h
new file mode 100644
index 0000000..5049967
--- /dev/null
+++ b/tools/virtio/linux/virtio_config.h
@@ -0,0 +1,6 @@
+#define VIRTIO_TRANSPORT_F_START 28
+#define VIRTIO_TRANSPORT_F_END 32
+
+#define virtio_has_feature(dev, feature) \
+ test_bit((feature), (dev)->features)
+
diff --git a/tools/virtio/linux/virtio_ring.h b/tools/virtio/linux/virtio_ring.h
new file mode 100644
index 0000000..8949c4e
--- /dev/null
+++ b/tools/virtio/linux/virtio_ring.h
@@ -0,0 +1 @@
+#include "../../../include/linux/virtio_ring.h"
diff --git a/tools/virtio/linux/vringh.h b/tools/virtio/linux/vringh.h
new file mode 100644
index 0000000..9348957
--- /dev/null
+++ b/tools/virtio/linux/vringh.h
@@ -0,0 +1 @@
+#include "../../../include/linux/vringh.h"
diff --git a/tools/virtio/uapi/linux/virtio_config.h b/tools/virtio/uapi/linux/virtio_config.h
new file mode 100644
index 0000000..4c86675
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_config.h
@@ -0,0 +1 @@
+#include "../../../../include/uapi/linux/virtio_config.h"
diff --git a/tools/virtio/uapi/linux/virtio_ring.h b/tools/virtio/uapi/linux/virtio_ring.h
new file mode 100644
index 0000000..4d99c78
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_ring.h
@@ -0,0 +1,4 @@
+#ifndef VIRTIO_RING_H
+#define VIRTIO_RING_H
+#include "../../../../include/uapi/linux/virtio_ring.h"
+#endif /* VIRTIO_RING_H */
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index a07142c..c613f25 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -10,11 +10,15 @@
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
+#include <stdbool.h>
#include <linux/vhost.h>
#include <linux/virtio.h>
#include <linux/virtio_ring.h>
#include "../../drivers/vhost/test.h"
+/* Unused */
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+
struct vq_info {
int kick;
int call;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH 4/6] tools/virtio: add vring_test.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
2013-01-17 10:29 ` [PATCH 2/6] tools/virtio: fix compile Rusty Russell
2013-01-17 10:29 ` [PATCH 3/6] tools/virtio: separate headers more Rusty Russell
@ 2013-01-17 10:29 ` Rusty Russell
2013-01-22 8:25 ` Asias He
2013-01-17 10:29 ` [PATCH 5/6] vringh: separate callback for notification Rusty Russell
` (4 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2013-01-17 10:29 UTC (permalink / raw)
To: virtualization; +Cc: Michael S . Tsirkin
This is mainly to test the drivers/vhost/vringh.c code, but it also
uses the drivers/virtio/virtio_ring.c code for the guest side.
Usage for testing the basic implementation:
./vringh_test
# Test with indirect descriptors
./vringh_test --indirect
# Test with indirect descriptors and event indexex
./vringh_test --indirect --eventidx
You can run a parallel stress test by adding --parallel to any of the
above options.
eg ./vringh_test --parallel:
Using CPUS 0 and 3
Guest: notified 10107974, pinged 107970
Host: notified 108158, pinged 3172148
Time: R=17.659 U=6.640 S=6.640
./vringh_test --eventidx --parallel:
Using CPUS 0 and 3
Guest: notified 156357, pinged 156251
Host: notified 156251, pinged 78179
Time: R=4.518 U=3.536 S=3.536
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
tools/virtio/Makefile | 4 +-
tools/virtio/vringh_test.c | 591 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 593 insertions(+), 2 deletions(-)
create mode 100644 tools/virtio/vringh_test.c
diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index d1d442e..b928c3e 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -1,5 +1,5 @@
all: test mod
-test: virtio_test
+test: virtio_test vringh_test
virtio_test: virtio_ring.o virtio_test.o
CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -MMD
vpath %.c ../../drivers/virtio
@@ -7,6 +7,6 @@ mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test
.PHONY: all test mod clean
clean:
- ${RM} *.o vhost_test/*.o vhost_test/.*.cmd \
+ ${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \
vhost_test/Module.symvers vhost_test/modules.order *.d
-include *.d
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
new file mode 100644
index 0000000..f3868f4
--- /dev/null
+++ b/tools/virtio/vringh_test.c
@@ -0,0 +1,591 @@
+/* Simple test of virtio code, entirely in userpsace. */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <err.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <../../drivers/vhost/vringh.c>
+#include <../../drivers/virtio/virtio_ring.c>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+
+#define USER_MEM (1024*1024)
+void *__user_addr_min, *__user_addr_max;
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static u64 user_addr_offset;
+
+#define RINGSIZE 256
+#define ALIGN 4096
+
+static void never_notify_host(struct virtqueue *vq)
+{
+ abort();
+}
+
+static void never_callback_guest(struct virtqueue *vq)
+{
+ abort();
+}
+
+static inline bool getrange_iov(u64 addr, struct vringh_range *r)
+{
+ r->start = (u64)(unsigned long)__user_addr_min - user_addr_offset;
+ r->end_incl = (u64)(unsigned long)__user_addr_max - 1 - user_addr_offset;
+ r->offset = user_addr_offset;
+ return true;
+}
+
+struct guest_virtio_device {
+ struct virtio_device vdev;
+ int to_host_fd;
+ unsigned long notifies;
+};
+
+static void parallel_notify_host(struct virtqueue *vq)
+{
+ struct guest_virtio_device *gvdev;
+
+ gvdev = container_of(vq->vdev, struct guest_virtio_device, vdev);
+ write(gvdev->to_host_fd, "", 1);
+ gvdev->notifies++;
+}
+
+#define NUM_XFERS (10000000)
+
+/* We aim for two "distant" cpus. */
+static void find_cpus(unsigned int *first, unsigned int *last)
+{
+ unsigned int i;
+
+ *first = -1U;
+ *last = 0;
+ for (i = 0; i < 4096; i++) {
+ cpu_set_t set;
+ CPU_ZERO(&set);
+ CPU_SET(i, &set);
+ if (sched_setaffinity(getpid(), sizeof(set), &set) == 0) {
+ if (i < *first)
+ *first = i;
+ if (i > *last)
+ *last = i;
+ }
+ }
+}
+
+static int parallel_test(unsigned long features)
+{
+ void *host_map, *guest_map;
+ int fd, mapsize, to_guest[2], to_host[2];
+ unsigned long xfers = 0, notifies = 0, receives = 0;
+ unsigned int first_cpu, last_cpu;
+ cpu_set_t cpu_set;
+
+ /* Create real file to mmap. */
+ fd = open("/tmp/vringh_test-file", O_RDWR|O_CREAT|O_TRUNC, 0600);
+ if (fd < 0)
+ err(1, "Opening /tmp/vringh_test-file");
+
+ /* Extra room at the end for some data, and indirects */
+ mapsize = vring_size(RINGSIZE, ALIGN)
+ + RINGSIZE * 2 * sizeof(int)
+ + RINGSIZE * 6 * sizeof(struct vring_desc);
+ mapsize = (mapsize + getpagesize() - 1) & ~(getpagesize() - 1);
+ ftruncate(fd, mapsize);
+
+ /* Parent and child use separate addresses, to check our mapping logic! */
+ host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+
+ pipe(to_guest);
+ pipe(to_host);
+
+ CPU_ZERO(&cpu_set);
+ find_cpus(&first_cpu, &last_cpu);
+ printf("Using CPUS %u and %u\n", first_cpu, last_cpu);
+ fflush(stdout);
+
+ if (fork() != 0) {
+ struct vringh vrh;
+ bool notify = false;
+ int status;
+
+ /* We are the host: never access guest addresses! */
+ munmap(guest_map, mapsize);
+
+ __user_addr_min = host_map;
+ __user_addr_max = __user_addr_min + mapsize;
+ user_addr_offset = host_map - guest_map;
+ assert(user_addr_offset);
+
+ close(to_guest[0]);
+ close(to_host[1]);
+
+ vring_init(&vrh.vring, RINGSIZE, host_map, ALIGN);
+ vringh_init_user(&vrh, features, RINGSIZE, true,
+ vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+ CPU_SET(first_cpu, &cpu_set);
+ if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+ err(1, "Could not set affinity to cpu %u", first_cpu);
+
+ while (xfers < NUM_XFERS) {
+ struct iovec host_riov[2], host_wiov[2];
+ struct vringh_iov riov, wiov;
+ char buf[5];
+ u16 head;
+ int rlen, err;
+
+ riov.iov = host_riov;
+ riov.max = ARRAY_SIZE(host_riov);
+ riov.allocated = false;
+
+ wiov.iov = host_wiov;
+ wiov.max = ARRAY_SIZE(host_wiov);
+ wiov.allocated = false;
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ &head, GFP_KERNEL);
+ if (err == 0) {
+ char buf[128];
+
+ if (notify) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ notify = false;
+ }
+
+ if (vringh_notify_enable_user(&vrh))
+ continue;
+
+ /* Swallow all notifies at once. */
+ if (read(to_host[0], buf, sizeof(buf)) < 1)
+ break;
+
+ vringh_notify_disable_user(&vrh);
+ receives++;
+ continue;
+ }
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ /* We simply copy bytes. */
+ rlen = vringh_iov_pull_user(&riov, buf, sizeof(buf));
+ if (rlen < 0)
+ errx(1, "vringh_iov_pull_user: %i", rlen);
+ err = vringh_iov_push_user(&wiov, buf, rlen);
+ if (err != rlen)
+ errx(1, "vringh_iov_push_user: %i", err);
+ xfers++;
+ assert(wiov.i == wiov.max);
+
+ err = vringh_complete_user(&vrh, head, rlen, ¬ify);
+ if (err != 0)
+ errx(1, "vringh_complete_user: %i", err);
+ }
+
+ if (notify) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ notify = false;
+ }
+ wait(&status);
+ if (!WIFEXITED(status))
+ errx(1, "Child died with signal %i?", WTERMSIG(status));
+ if (WEXITSTATUS(status) != 0)
+ errx(1, "Child exited %i?", WEXITSTATUS(status));
+ printf("Host: notified %lu, pinged %lu\n", notifies, receives);
+ return 0;
+ } else {
+ struct guest_virtio_device gvdev;
+ struct virtqueue *vq;
+ unsigned int *data;
+ struct vring_desc *indirects;
+ unsigned int finished = 0;
+
+ /* We pass sg[]s pointing into here, but we need RINGSIZE+1 */
+ data = guest_map + vring_size(RINGSIZE, ALIGN);
+ indirects = (void *)data + (RINGSIZE + 1) * 2 * sizeof(int);
+
+ /* We are the guest. */
+ munmap(host_map, mapsize);
+
+ close(to_guest[1]);
+ close(to_host[0]);
+
+ gvdev.vdev.features[0] = features;
+ gvdev.to_host_fd = to_host[1];
+
+ CPU_SET(first_cpu, &cpu_set);
+ if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+ err(1, "Could not set affinity to cpu %u", first_cpu);
+
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev, true,
+ guest_map, parallel_notify_host,
+ never_callback_guest, "guest vq");
+
+ /* Don't kfree indirects. */
+ __kfree_ignore_start = indirects;
+ __kfree_ignore_end = indirects + RINGSIZE * 6;
+
+ while (xfers < NUM_XFERS) {
+ struct scatterlist sg[6];
+ unsigned int num_sg, len;
+ int *din, *dout, err;
+
+ /* Consume bufs. */
+ while ((din = virtqueue_get_buf(vq, &len)) != NULL) {
+ dout = din + 1;
+ assert(*dout == *din);
+ assert(len == 4);
+ finished++;
+ }
+
+ /* Produce a buffer. */
+ din = data + (xfers % (RINGSIZE + 1)) * 2;
+ dout = din + 1;
+
+ *din = xfers;
+ switch ((xfers / sizeof(*din)) % 3) {
+ case 0:
+ /* Nasty three-element sg list. */
+ sg_init_table(sg, num_sg = 3);
+ sg_set_buf(&sg[0], (void *)din, 1);
+ sg_set_buf(&sg[1], (void *)din + 1, 2);
+ sg_set_buf(&sg[2], (void *)din + 3, 1);
+ sg_init_table(sg + num_sg, num_sg);
+ sg_set_buf(&sg[num_sg+0], (void *)dout, 1);
+ sg_set_buf(&sg[num_sg+1], (void *)dout + 1, 2);
+ sg_set_buf(&sg[num_sg+2], (void *)dout + 3, 1);
+ break;
+ case 1:
+ sg_init_table(sg, num_sg = 2);
+ sg_set_buf(&sg[0], (void *)din, 1);
+ sg_set_buf(&sg[1], (void *)din + 1, 3);
+ sg_init_table(sg + num_sg, num_sg);
+ sg_set_buf(&sg[num_sg+0], (void *)dout, 1);
+ sg_set_buf(&sg[num_sg+1], (void *)dout + 1, 3);
+ break;
+ case 2:
+ sg_init_table(sg, num_sg = 1);
+ sg_set_buf(&sg[0], (void *)din, 4);
+ sg_init_table(sg + num_sg, num_sg);
+ sg_set_buf(&sg[num_sg+0], (void *)dout, 4);
+ break;
+ }
+
+ /* May allocate an indirect, so force it to allocate
+ * user addr */
+ __kmalloc_fake = indirects + (xfers % RINGSIZE) * 6;
+ err = virtqueue_add_buf(vq, sg, num_sg, num_sg, din,
+ GFP_KERNEL);
+ if (err == -ENOSPC) {
+ char buf[128];
+
+ if (!virtqueue_enable_cb_delayed(vq))
+ continue;
+ /* Swallow all notifies at once. */
+ if (read(to_guest[0], buf, sizeof(buf)) < 1)
+ break;
+
+ receives++;
+ virtqueue_disable_cb(vq);
+ continue;
+ }
+
+ if (err)
+ errx(1, "virtqueue_add_buf: %i", err);
+
+ xfers++;
+ virtqueue_kick(vq);
+ }
+
+ /* Any extra? */
+ while (finished != xfers) {
+ char buf[128];
+ int *din, *dout;
+ unsigned int len;
+
+ /* Consume bufs. */
+ din = virtqueue_get_buf(vq, &len);
+ if (din) {
+ dout = din + 1;
+ assert(*dout == *din);
+ assert(len == 4);
+ finished++;
+ continue;
+ }
+
+ if (!virtqueue_enable_cb_delayed(vq))
+ continue;
+ if (read(to_guest[0], buf, sizeof(buf)) < 1)
+ break;
+
+ receives++;
+ virtqueue_disable_cb(vq);
+ }
+
+ printf("Guest: notified %lu, pinged %lu\n",
+ gvdev.notifies, receives);
+ return 0;
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct virtio_device vdev;
+ struct virtqueue *vq;
+ struct vringh vrh;
+ struct scatterlist guest_sg[RINGSIZE];
+ struct iovec host_riov[2], host_wiov[2];
+ struct vringh_iov riov, wiov;
+ char buf[28];
+ u16 head;
+ int err;
+ unsigned i;
+ bool notify = false;
+ void *ret;
+
+ vdev.features[0] = 0;
+
+ if (argv[1] && strcmp(argv[1], "--indirect") == 0) {
+ vdev.features[0] |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
+ argv++;
+ }
+
+ if (argv[1] && strcmp(argv[1], "--eventidx") == 0) {
+ vdev.features[0] |= (1 << VIRTIO_RING_F_EVENT_IDX);
+ argv++;
+ }
+
+ if (argv[1] && strcmp(argv[1], "--parallel") == 0)
+ return parallel_test(vdev.features[0]);
+
+ if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
+ abort();
+ __user_addr_max = __user_addr_min + USER_MEM;
+ memset(__user_addr_min, 0, vring_size(RINGSIZE, ALIGN));
+
+ /* Set up guest side. */
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
+ __user_addr_min,
+ never_notify_host, never_callback_guest,
+ "guest vq");
+
+ /* Set up host side. */
+ vring_init(&vrh.vring, RINGSIZE, __user_addr_min, ALIGN);
+ vringh_init_user(&vrh, vdev.features[0], RINGSIZE, true,
+ vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+
+ /* No descriptor to get yet... */
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ &head, GFP_KERNEL);
+ if (err != 0)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ /* Guest puts in a descriptor. */
+ memcpy(__user_addr_max - 1, "a", 1);
+ sg_init_table(guest_sg, 1);
+ sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+ sg_init_table(guest_sg+1, 1);
+ sg_set_buf(&guest_sg[1], __user_addr_max - 3, 2);
+
+ /* May allocate an indirect, so force it to allocate user addr */
+ __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ err = virtqueue_add_buf(vq, guest_sg, 1, 1, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf: %i", err);
+ __kmalloc_fake = NULL;
+
+ /* Host retreives it. */
+ riov.iov = host_riov;
+ riov.max = ARRAY_SIZE(host_riov);
+ riov.allocated = false;
+
+ wiov.iov = host_wiov;
+ wiov.max = ARRAY_SIZE(host_wiov);
+ wiov.allocated = false;
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ &head, GFP_KERNEL);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ assert(riov.max == 1);
+ assert(riov.iov[0].iov_base == __user_addr_max - 1);
+ assert(riov.iov[0].iov_len == 1);
+ assert(wiov.max == 1);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 2);
+
+ err = vringh_iov_pull_user(&riov, buf, 5);
+ if (err != 1)
+ errx(1, "vringh_iov_pull_kern: %i", err);
+ assert(buf[0] == 'a');
+ assert(riov.i == 1);
+ assert(vringh_iov_pull_kern(&riov, buf, 5) == 0);
+
+ memcpy(buf, "bcdef", 5);
+ err = vringh_iov_push_user(&wiov, buf, 5);
+ if (err != 2)
+ errx(1, "vringh_iov_push_user: %i", err);
+ assert(memcmp(__user_addr_max - 3, "bc", 2) == 0);
+ assert(wiov.i == 1);
+ assert(vringh_iov_push_kern(&wiov, buf, 5) == 0);
+
+ /* Host is done. */
+ err = vringh_complete_user(&vrh, head, err, ¬ify);
+ if (err != 0)
+ errx(1, "vringh_complete_user: %i", err);
+
+ /* Guest should see used token now. */
+ __kfree_ignore_start = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ __kfree_ignore_end = __kfree_ignore_start + 1;
+ ret = virtqueue_get_buf(vq, &i);
+ if (ret != &err)
+ errx(1, "virtqueue_get_buf: %p", ret);
+ assert(i == 2);
+
+ /* Guest puts in a huge descriptor. */
+ sg_init_table(guest_sg, RINGSIZE);
+ for (i = 0; i < RINGSIZE; i++) {
+ sg_set_buf(&guest_sg[i],
+ __user_addr_max - USER_MEM/4, USER_MEM/4);
+ }
+
+ /* Fill contents with recognisable garbage. */
+ for (i = 0; i < USER_MEM/4; i++)
+ ((char *)__user_addr_max - USER_MEM/4)[i] = i;
+
+ /* This will allocate an indirect, so force it to allocate user addr */
+ __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ err = virtqueue_add_buf(vq, guest_sg, RINGSIZE, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (large): %i", err);
+ __kmalloc_fake = NULL;
+
+ /* Host picks it up (allocates new iov). */
+ riov.iov = host_riov;
+ riov.max = ARRAY_SIZE(host_riov);
+ riov.allocated = false;
+
+ wiov.iov = host_wiov;
+ wiov.max = ARRAY_SIZE(host_wiov);
+ wiov.allocated = false;
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ &head, GFP_KERNEL);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ assert(riov.allocated);
+ assert(riov.iov != host_riov);
+ assert(riov.max == RINGSIZE);
+
+ assert(!wiov.allocated);
+ assert(wiov.max == 0);
+
+ /* Pull data back out (in odd chunks), should be as expected. */
+ for (i = 0; i < RINGSIZE * USER_MEM/4; i += 3) {
+ err = vringh_iov_pull_user(&riov, buf, 3);
+ if (err != 3 && i + err != RINGSIZE * USER_MEM/4)
+ errx(1, "vringh_iov_pull_user large: %i", err);
+ assert(buf[0] == (char)i);
+ assert(err < 2 || buf[1] == (char)(i + 1));
+ assert(err < 3 || buf[2] == (char)(i + 2));
+ }
+ assert(wiov.i == wiov.max);
+
+ kfree(riov.iov);
+
+ /* Test weird (but legal!) indirect. */
+ if (vdev.features[0] & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
+ struct vring_virtqueue *vvq = to_vvq(vq);
+ char *data = __user_addr_max - USER_MEM/4;
+ struct vring_desc *d = __user_addr_max - USER_MEM/2;
+ unsigned int n = vvq->free_head;
+
+ /* Force creation of direct, which we modify. */
+ vvq->indirect = false;
+
+ sg_init_table(guest_sg, 4);
+ sg_set_buf(&guest_sg[0], d, sizeof(*d)*2);
+ sg_set_buf(&guest_sg[1], d + 2, sizeof(*d)*1);
+ sg_set_buf(&guest_sg[2], data + 6, 4);
+ sg_set_buf(&guest_sg[3], d + 3, sizeof(*d)*3);
+
+ err = virtqueue_add_buf(vq, guest_sg, 4, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (indirect): %i", err);
+
+ /* They're used in order, but double-check... */
+ assert(vvq->vring.desc[n].addr == (unsigned long)d);
+ assert(vvq->vring.desc[n+1].addr == (unsigned long)(d+2));
+ assert(vvq->vring.desc[n+2].addr == (unsigned long)data + 6);
+ assert(vvq->vring.desc[n+3].addr == (unsigned long)(d+3));
+ vvq->vring.desc[n].flags |= VRING_DESC_F_INDIRECT;
+ vvq->vring.desc[n+1].flags |= VRING_DESC_F_INDIRECT;
+ vvq->vring.desc[n+3].flags |= VRING_DESC_F_INDIRECT;
+
+ /* First indirect */
+ d[0].addr = (unsigned long)data;
+ d[0].len = 1;
+ d[0].flags = VRING_DESC_F_NEXT;
+ d[0].next = 1;
+ d[1].addr = (unsigned long)data + 1;
+ d[1].len = 2;
+ d[1].flags = 0;
+
+ /* Second indirect */
+ d[2].addr = (unsigned long)data + 3;
+ d[2].len = 3;
+ d[2].flags = 0;
+
+ /* Third indirect */
+ d[3].addr = (unsigned long)data + 10;
+ d[3].len = 5;
+ d[3].flags = VRING_DESC_F_NEXT;
+ d[3].next = 1;
+ d[4].addr = (unsigned long)data + 15;
+ d[4].len = 6;
+ d[4].flags = VRING_DESC_F_NEXT;
+ d[4].next = 2;
+ d[5].addr = (unsigned long)data + 21;
+ d[5].len = 7;
+ d[5].flags = 0;
+
+ /* Host picks it up (allocates new iov). */
+ riov.iov = host_riov;
+ riov.max = ARRAY_SIZE(host_riov);
+ riov.allocated = false;
+
+ wiov.iov = host_wiov;
+ wiov.max = ARRAY_SIZE(host_wiov);
+ wiov.allocated = false;
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ &head, GFP_KERNEL);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ if (head != n)
+ errx(1, "vringh_getdesc_user: head %i not %i", head, n);
+
+ assert(riov.max == 7);
+ assert(riov.allocated);
+ err = vringh_iov_pull_user(&riov, buf, 29);
+ assert(err == 28);
+
+ /* Data should be linear. */
+ for (i = 0; i < err; i++)
+ assert(buf[i] == i);
+ kfree(riov.iov);
+ }
+
+ /* Don't leak memory... */
+ vring_del_virtqueue(vq);
+ free(__user_addr_min);
+
+ return 0;
+}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH 4/6] tools/virtio: add vring_test.
2013-01-17 10:29 ` [PATCH 4/6] tools/virtio: add vring_test Rusty Russell
@ 2013-01-22 8:25 ` Asias He
2013-01-22 23:03 ` Rusty Russell
0 siblings, 1 reply; 30+ messages in thread
From: Asias He @ 2013-01-22 8:25 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S . Tsirkin, virtualization
On 01/17/2013 06:29 PM, Rusty Russell wrote:
> This is mainly to test the drivers/vhost/vringh.c code, but it also
> uses the drivers/virtio/virtio_ring.c code for the guest side.
vringh_test.c does not compile here:
(This series on top of 9a9284153d965a57edc7162a8e57c14c97f3a935)
$ cd tools/virtio
$ make
cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
-fno-strict-overflow -MMD vringh_test.c -o vringh_test
In file included from ./linux/vringh.h:1:0,
from ./../../drivers/vhost/vringh.c:6,
from vringh_test.c:7:
./linux/../../../include/linux/vringh.h:27:28: fatal error:
uapi/linux/uio.h: No such file or directory
compilation terminated.
make: *** [vringh_test] Error 1
> Usage for testing the basic implementation:
>
> ./vringh_test
> # Test with indirect descriptors
> ./vringh_test --indirect
> # Test with indirect descriptors and event indexex
> ./vringh_test --indirect --eventidx
>
> You can run a parallel stress test by adding --parallel to any of the
> above options.
>
> eg ./vringh_test --parallel:
> Using CPUS 0 and 3
> Guest: notified 10107974, pinged 107970
> Host: notified 108158, pinged 3172148
> Time: R=17.659 U=6.640 S=6.640
>
> ./vringh_test --eventidx --parallel:
> Using CPUS 0 and 3
> Guest: notified 156357, pinged 156251
> Host: notified 156251, pinged 78179
> Time: R=4.518 U=3.536 S=3.536
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
> tools/virtio/Makefile | 4 +-
> tools/virtio/vringh_test.c | 591 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 593 insertions(+), 2 deletions(-)
> create mode 100644 tools/virtio/vringh_test.c
>
> diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
> index d1d442e..b928c3e 100644
> --- a/tools/virtio/Makefile
> +++ b/tools/virtio/Makefile
> @@ -1,5 +1,5 @@
> all: test mod
> -test: virtio_test
> +test: virtio_test vringh_test
> virtio_test: virtio_ring.o virtio_test.o
> CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -MMD
> vpath %.c ../../drivers/virtio
> @@ -7,6 +7,6 @@ mod:
> ${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test
> .PHONY: all test mod clean
> clean:
> - ${RM} *.o vhost_test/*.o vhost_test/.*.cmd \
> + ${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \
> vhost_test/Module.symvers vhost_test/modules.order *.d
> -include *.d
> diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
> new file mode 100644
> index 0000000..f3868f4
> --- /dev/null
> +++ b/tools/virtio/vringh_test.c
> @@ -0,0 +1,591 @@
> +/* Simple test of virtio code, entirely in userpsace. */
> +#define _GNU_SOURCE
> +#include <sched.h>
> +#include <err.h>
> +#include <linux/kernel.h>
> +#include <linux/err.h>
> +#include <../../drivers/vhost/vringh.c>
> +#include <../../drivers/virtio/virtio_ring.c>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/mman.h>
> +#include <sys/wait.h>
> +#include <fcntl.h>
> +
> +#define USER_MEM (1024*1024)
> +void *__user_addr_min, *__user_addr_max;
> +void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
> +static u64 user_addr_offset;
> +
> +#define RINGSIZE 256
> +#define ALIGN 4096
> +
> +static void never_notify_host(struct virtqueue *vq)
> +{
> + abort();
> +}
> +
> +static void never_callback_guest(struct virtqueue *vq)
> +{
> + abort();
> +}
> +
> +static inline bool getrange_iov(u64 addr, struct vringh_range *r)
> +{
> + r->start = (u64)(unsigned long)__user_addr_min - user_addr_offset;
> + r->end_incl = (u64)(unsigned long)__user_addr_max - 1 - user_addr_offset;
> + r->offset = user_addr_offset;
> + return true;
> +}
> +
> +struct guest_virtio_device {
> + struct virtio_device vdev;
> + int to_host_fd;
> + unsigned long notifies;
> +};
> +
> +static void parallel_notify_host(struct virtqueue *vq)
> +{
> + struct guest_virtio_device *gvdev;
> +
> + gvdev = container_of(vq->vdev, struct guest_virtio_device, vdev);
> + write(gvdev->to_host_fd, "", 1);
> + gvdev->notifies++;
> +}
> +
> +#define NUM_XFERS (10000000)
> +
> +/* We aim for two "distant" cpus. */
> +static void find_cpus(unsigned int *first, unsigned int *last)
> +{
> + unsigned int i;
> +
> + *first = -1U;
> + *last = 0;
> + for (i = 0; i < 4096; i++) {
> + cpu_set_t set;
> + CPU_ZERO(&set);
> + CPU_SET(i, &set);
> + if (sched_setaffinity(getpid(), sizeof(set), &set) == 0) {
> + if (i < *first)
> + *first = i;
> + if (i > *last)
> + *last = i;
> + }
> + }
> +}
> +
> +static int parallel_test(unsigned long features)
> +{
> + void *host_map, *guest_map;
> + int fd, mapsize, to_guest[2], to_host[2];
> + unsigned long xfers = 0, notifies = 0, receives = 0;
> + unsigned int first_cpu, last_cpu;
> + cpu_set_t cpu_set;
> +
> + /* Create real file to mmap. */
> + fd = open("/tmp/vringh_test-file", O_RDWR|O_CREAT|O_TRUNC, 0600);
> + if (fd < 0)
> + err(1, "Opening /tmp/vringh_test-file");
> +
> + /* Extra room at the end for some data, and indirects */
> + mapsize = vring_size(RINGSIZE, ALIGN)
> + + RINGSIZE * 2 * sizeof(int)
> + + RINGSIZE * 6 * sizeof(struct vring_desc);
> + mapsize = (mapsize + getpagesize() - 1) & ~(getpagesize() - 1);
> + ftruncate(fd, mapsize);
> +
> + /* Parent and child use separate addresses, to check our mapping logic! */
> + host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> + guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
> +
> + pipe(to_guest);
> + pipe(to_host);
> +
> + CPU_ZERO(&cpu_set);
> + find_cpus(&first_cpu, &last_cpu);
> + printf("Using CPUS %u and %u\n", first_cpu, last_cpu);
> + fflush(stdout);
> +
> + if (fork() != 0) {
> + struct vringh vrh;
> + bool notify = false;
> + int status;
> +
> + /* We are the host: never access guest addresses! */
> + munmap(guest_map, mapsize);
> +
> + __user_addr_min = host_map;
> + __user_addr_max = __user_addr_min + mapsize;
> + user_addr_offset = host_map - guest_map;
> + assert(user_addr_offset);
> +
> + close(to_guest[0]);
> + close(to_host[1]);
> +
> + vring_init(&vrh.vring, RINGSIZE, host_map, ALIGN);
> + vringh_init_user(&vrh, features, RINGSIZE, true,
> + vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
> + CPU_SET(first_cpu, &cpu_set);
> + if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
> + err(1, "Could not set affinity to cpu %u", first_cpu);
> +
> + while (xfers < NUM_XFERS) {
> + struct iovec host_riov[2], host_wiov[2];
> + struct vringh_iov riov, wiov;
> + char buf[5];
> + u16 head;
> + int rlen, err;
> +
> + riov.iov = host_riov;
> + riov.max = ARRAY_SIZE(host_riov);
> + riov.allocated = false;
> +
> + wiov.iov = host_wiov;
> + wiov.max = ARRAY_SIZE(host_wiov);
> + wiov.allocated = false;
> +
> + err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
> + &head, GFP_KERNEL);
> + if (err == 0) {
> + char buf[128];
> +
> + if (notify) {
> + write(to_guest[1], "", 1);
> + notifies++;
> + notify = false;
> + }
> +
> + if (vringh_notify_enable_user(&vrh))
> + continue;
> +
> + /* Swallow all notifies at once. */
> + if (read(to_host[0], buf, sizeof(buf)) < 1)
> + break;
> +
> + vringh_notify_disable_user(&vrh);
> + receives++;
> + continue;
> + }
> + if (err != 1)
> + errx(1, "vringh_getdesc_user: %i", err);
> +
> + /* We simply copy bytes. */
> + rlen = vringh_iov_pull_user(&riov, buf, sizeof(buf));
> + if (rlen < 0)
> + errx(1, "vringh_iov_pull_user: %i", rlen);
> + err = vringh_iov_push_user(&wiov, buf, rlen);
> + if (err != rlen)
> + errx(1, "vringh_iov_push_user: %i", err);
> + xfers++;
> + assert(wiov.i == wiov.max);
> +
> + err = vringh_complete_user(&vrh, head, rlen, ¬ify);
> + if (err != 0)
> + errx(1, "vringh_complete_user: %i", err);
> + }
> +
> + if (notify) {
> + write(to_guest[1], "", 1);
> + notifies++;
> + notify = false;
> + }
> + wait(&status);
> + if (!WIFEXITED(status))
> + errx(1, "Child died with signal %i?", WTERMSIG(status));
> + if (WEXITSTATUS(status) != 0)
> + errx(1, "Child exited %i?", WEXITSTATUS(status));
> + printf("Host: notified %lu, pinged %lu\n", notifies, receives);
> + return 0;
> + } else {
> + struct guest_virtio_device gvdev;
> + struct virtqueue *vq;
> + unsigned int *data;
> + struct vring_desc *indirects;
> + unsigned int finished = 0;
> +
> + /* We pass sg[]s pointing into here, but we need RINGSIZE+1 */
> + data = guest_map + vring_size(RINGSIZE, ALIGN);
> + indirects = (void *)data + (RINGSIZE + 1) * 2 * sizeof(int);
> +
> + /* We are the guest. */
> + munmap(host_map, mapsize);
> +
> + close(to_guest[1]);
> + close(to_host[0]);
> +
> + gvdev.vdev.features[0] = features;
> + gvdev.to_host_fd = to_host[1];
> +
> + CPU_SET(first_cpu, &cpu_set);
> + if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
> + err(1, "Could not set affinity to cpu %u", first_cpu);
> +
> + vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev, true,
> + guest_map, parallel_notify_host,
> + never_callback_guest, "guest vq");
> +
> + /* Don't kfree indirects. */
> + __kfree_ignore_start = indirects;
> + __kfree_ignore_end = indirects + RINGSIZE * 6;
> +
> + while (xfers < NUM_XFERS) {
> + struct scatterlist sg[6];
> + unsigned int num_sg, len;
> + int *din, *dout, err;
> +
> + /* Consume bufs. */
> + while ((din = virtqueue_get_buf(vq, &len)) != NULL) {
> + dout = din + 1;
> + assert(*dout == *din);
> + assert(len == 4);
> + finished++;
> + }
> +
> + /* Produce a buffer. */
> + din = data + (xfers % (RINGSIZE + 1)) * 2;
> + dout = din + 1;
> +
> + *din = xfers;
> + switch ((xfers / sizeof(*din)) % 3) {
> + case 0:
> + /* Nasty three-element sg list. */
> + sg_init_table(sg, num_sg = 3);
> + sg_set_buf(&sg[0], (void *)din, 1);
> + sg_set_buf(&sg[1], (void *)din + 1, 2);
> + sg_set_buf(&sg[2], (void *)din + 3, 1);
> + sg_init_table(sg + num_sg, num_sg);
> + sg_set_buf(&sg[num_sg+0], (void *)dout, 1);
> + sg_set_buf(&sg[num_sg+1], (void *)dout + 1, 2);
> + sg_set_buf(&sg[num_sg+2], (void *)dout + 3, 1);
> + break;
> + case 1:
> + sg_init_table(sg, num_sg = 2);
> + sg_set_buf(&sg[0], (void *)din, 1);
> + sg_set_buf(&sg[1], (void *)din + 1, 3);
> + sg_init_table(sg + num_sg, num_sg);
> + sg_set_buf(&sg[num_sg+0], (void *)dout, 1);
> + sg_set_buf(&sg[num_sg+1], (void *)dout + 1, 3);
> + break;
> + case 2:
> + sg_init_table(sg, num_sg = 1);
> + sg_set_buf(&sg[0], (void *)din, 4);
> + sg_init_table(sg + num_sg, num_sg);
> + sg_set_buf(&sg[num_sg+0], (void *)dout, 4);
> + break;
> + }
> +
> + /* May allocate an indirect, so force it to allocate
> + * user addr */
> + __kmalloc_fake = indirects + (xfers % RINGSIZE) * 6;
> + err = virtqueue_add_buf(vq, sg, num_sg, num_sg, din,
> + GFP_KERNEL);
> + if (err == -ENOSPC) {
> + char buf[128];
> +
> + if (!virtqueue_enable_cb_delayed(vq))
> + continue;
> + /* Swallow all notifies at once. */
> + if (read(to_guest[0], buf, sizeof(buf)) < 1)
> + break;
> +
> + receives++;
> + virtqueue_disable_cb(vq);
> + continue;
> + }
> +
> + if (err)
> + errx(1, "virtqueue_add_buf: %i", err);
> +
> + xfers++;
> + virtqueue_kick(vq);
> + }
> +
> + /* Any extra? */
> + while (finished != xfers) {
> + char buf[128];
> + int *din, *dout;
> + unsigned int len;
> +
> + /* Consume bufs. */
> + din = virtqueue_get_buf(vq, &len);
> + if (din) {
> + dout = din + 1;
> + assert(*dout == *din);
> + assert(len == 4);
> + finished++;
> + continue;
> + }
> +
> + if (!virtqueue_enable_cb_delayed(vq))
> + continue;
> + if (read(to_guest[0], buf, sizeof(buf)) < 1)
> + break;
> +
> + receives++;
> + virtqueue_disable_cb(vq);
> + }
> +
> + printf("Guest: notified %lu, pinged %lu\n",
> + gvdev.notifies, receives);
> + return 0;
> + }
> +}
> +
> +int main(int argc, char *argv[])
> +{
> + struct virtio_device vdev;
> + struct virtqueue *vq;
> + struct vringh vrh;
> + struct scatterlist guest_sg[RINGSIZE];
> + struct iovec host_riov[2], host_wiov[2];
> + struct vringh_iov riov, wiov;
> + char buf[28];
> + u16 head;
> + int err;
> + unsigned i;
> + bool notify = false;
> + void *ret;
> +
> + vdev.features[0] = 0;
> +
> + if (argv[1] && strcmp(argv[1], "--indirect") == 0) {
> + vdev.features[0] |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
> + argv++;
> + }
> +
> + if (argv[1] && strcmp(argv[1], "--eventidx") == 0) {
> + vdev.features[0] |= (1 << VIRTIO_RING_F_EVENT_IDX);
> + argv++;
> + }
> +
> + if (argv[1] && strcmp(argv[1], "--parallel") == 0)
> + return parallel_test(vdev.features[0]);
> +
> + if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
> + abort();
> + __user_addr_max = __user_addr_min + USER_MEM;
> + memset(__user_addr_min, 0, vring_size(RINGSIZE, ALIGN));
> +
> + /* Set up guest side. */
> + vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
> + __user_addr_min,
> + never_notify_host, never_callback_guest,
> + "guest vq");
> +
> + /* Set up host side. */
> + vring_init(&vrh.vring, RINGSIZE, __user_addr_min, ALIGN);
> + vringh_init_user(&vrh, vdev.features[0], RINGSIZE, true,
> + vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
> +
> + /* No descriptor to get yet... */
> + err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
> + &head, GFP_KERNEL);
> + if (err != 0)
> + errx(1, "vringh_getdesc_user: %i", err);
> +
> + /* Guest puts in a descriptor. */
> + memcpy(__user_addr_max - 1, "a", 1);
> + sg_init_table(guest_sg, 1);
> + sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
> + sg_init_table(guest_sg+1, 1);
> + sg_set_buf(&guest_sg[1], __user_addr_max - 3, 2);
> +
> + /* May allocate an indirect, so force it to allocate user addr */
> + __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
> + err = virtqueue_add_buf(vq, guest_sg, 1, 1, &err, GFP_KERNEL);
> + if (err)
> + errx(1, "virtqueue_add_buf: %i", err);
> + __kmalloc_fake = NULL;
> +
> + /* Host retreives it. */
> + riov.iov = host_riov;
> + riov.max = ARRAY_SIZE(host_riov);
> + riov.allocated = false;
> +
> + wiov.iov = host_wiov;
> + wiov.max = ARRAY_SIZE(host_wiov);
> + wiov.allocated = false;
> +
> + err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
> + &head, GFP_KERNEL);
> + if (err != 1)
> + errx(1, "vringh_getdesc_user: %i", err);
> +
> + assert(riov.max == 1);
> + assert(riov.iov[0].iov_base == __user_addr_max - 1);
> + assert(riov.iov[0].iov_len == 1);
> + assert(wiov.max == 1);
> + assert(wiov.iov[0].iov_base == __user_addr_max - 3);
> + assert(wiov.iov[0].iov_len == 2);
> +
> + err = vringh_iov_pull_user(&riov, buf, 5);
> + if (err != 1)
> + errx(1, "vringh_iov_pull_kern: %i", err);
> + assert(buf[0] == 'a');
> + assert(riov.i == 1);
> + assert(vringh_iov_pull_kern(&riov, buf, 5) == 0);
> +
> + memcpy(buf, "bcdef", 5);
> + err = vringh_iov_push_user(&wiov, buf, 5);
> + if (err != 2)
> + errx(1, "vringh_iov_push_user: %i", err);
> + assert(memcmp(__user_addr_max - 3, "bc", 2) == 0);
> + assert(wiov.i == 1);
> + assert(vringh_iov_push_kern(&wiov, buf, 5) == 0);
> +
> + /* Host is done. */
> + err = vringh_complete_user(&vrh, head, err, ¬ify);
> + if (err != 0)
> + errx(1, "vringh_complete_user: %i", err);
> +
> + /* Guest should see used token now. */
> + __kfree_ignore_start = __user_addr_min + vring_size(RINGSIZE, ALIGN);
> + __kfree_ignore_end = __kfree_ignore_start + 1;
> + ret = virtqueue_get_buf(vq, &i);
> + if (ret != &err)
> + errx(1, "virtqueue_get_buf: %p", ret);
> + assert(i == 2);
> +
> + /* Guest puts in a huge descriptor. */
> + sg_init_table(guest_sg, RINGSIZE);
> + for (i = 0; i < RINGSIZE; i++) {
> + sg_set_buf(&guest_sg[i],
> + __user_addr_max - USER_MEM/4, USER_MEM/4);
> + }
> +
> + /* Fill contents with recognisable garbage. */
> + for (i = 0; i < USER_MEM/4; i++)
> + ((char *)__user_addr_max - USER_MEM/4)[i] = i;
> +
> + /* This will allocate an indirect, so force it to allocate user addr */
> + __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
> + err = virtqueue_add_buf(vq, guest_sg, RINGSIZE, 0, &err, GFP_KERNEL);
> + if (err)
> + errx(1, "virtqueue_add_buf (large): %i", err);
> + __kmalloc_fake = NULL;
> +
> + /* Host picks it up (allocates new iov). */
> + riov.iov = host_riov;
> + riov.max = ARRAY_SIZE(host_riov);
> + riov.allocated = false;
> +
> + wiov.iov = host_wiov;
> + wiov.max = ARRAY_SIZE(host_wiov);
> + wiov.allocated = false;
> +
> + err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
> + &head, GFP_KERNEL);
> + if (err != 1)
> + errx(1, "vringh_getdesc_user: %i", err);
> +
> + assert(riov.allocated);
> + assert(riov.iov != host_riov);
> + assert(riov.max == RINGSIZE);
> +
> + assert(!wiov.allocated);
> + assert(wiov.max == 0);
> +
> + /* Pull data back out (in odd chunks), should be as expected. */
> + for (i = 0; i < RINGSIZE * USER_MEM/4; i += 3) {
> + err = vringh_iov_pull_user(&riov, buf, 3);
> + if (err != 3 && i + err != RINGSIZE * USER_MEM/4)
> + errx(1, "vringh_iov_pull_user large: %i", err);
> + assert(buf[0] == (char)i);
> + assert(err < 2 || buf[1] == (char)(i + 1));
> + assert(err < 3 || buf[2] == (char)(i + 2));
> + }
> + assert(wiov.i == wiov.max);
> +
> + kfree(riov.iov);
> +
> + /* Test weird (but legal!) indirect. */
> + if (vdev.features[0] & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
> + struct vring_virtqueue *vvq = to_vvq(vq);
> + char *data = __user_addr_max - USER_MEM/4;
> + struct vring_desc *d = __user_addr_max - USER_MEM/2;
> + unsigned int n = vvq->free_head;
> +
> + /* Force creation of direct, which we modify. */
> + vvq->indirect = false;
> +
> + sg_init_table(guest_sg, 4);
> + sg_set_buf(&guest_sg[0], d, sizeof(*d)*2);
> + sg_set_buf(&guest_sg[1], d + 2, sizeof(*d)*1);
> + sg_set_buf(&guest_sg[2], data + 6, 4);
> + sg_set_buf(&guest_sg[3], d + 3, sizeof(*d)*3);
> +
> + err = virtqueue_add_buf(vq, guest_sg, 4, 0, &err, GFP_KERNEL);
> + if (err)
> + errx(1, "virtqueue_add_buf (indirect): %i", err);
> +
> + /* They're used in order, but double-check... */
> + assert(vvq->vring.desc[n].addr == (unsigned long)d);
> + assert(vvq->vring.desc[n+1].addr == (unsigned long)(d+2));
> + assert(vvq->vring.desc[n+2].addr == (unsigned long)data + 6);
> + assert(vvq->vring.desc[n+3].addr == (unsigned long)(d+3));
> + vvq->vring.desc[n].flags |= VRING_DESC_F_INDIRECT;
> + vvq->vring.desc[n+1].flags |= VRING_DESC_F_INDIRECT;
> + vvq->vring.desc[n+3].flags |= VRING_DESC_F_INDIRECT;
> +
> + /* First indirect */
> + d[0].addr = (unsigned long)data;
> + d[0].len = 1;
> + d[0].flags = VRING_DESC_F_NEXT;
> + d[0].next = 1;
> + d[1].addr = (unsigned long)data + 1;
> + d[1].len = 2;
> + d[1].flags = 0;
> +
> + /* Second indirect */
> + d[2].addr = (unsigned long)data + 3;
> + d[2].len = 3;
> + d[2].flags = 0;
> +
> + /* Third indirect */
> + d[3].addr = (unsigned long)data + 10;
> + d[3].len = 5;
> + d[3].flags = VRING_DESC_F_NEXT;
> + d[3].next = 1;
> + d[4].addr = (unsigned long)data + 15;
> + d[4].len = 6;
> + d[4].flags = VRING_DESC_F_NEXT;
> + d[4].next = 2;
> + d[5].addr = (unsigned long)data + 21;
> + d[5].len = 7;
> + d[5].flags = 0;
> +
> + /* Host picks it up (allocates new iov). */
> + riov.iov = host_riov;
> + riov.max = ARRAY_SIZE(host_riov);
> + riov.allocated = false;
> +
> + wiov.iov = host_wiov;
> + wiov.max = ARRAY_SIZE(host_wiov);
> + wiov.allocated = false;
> +
> + err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
> + &head, GFP_KERNEL);
> + if (err != 1)
> + errx(1, "vringh_getdesc_user: %i", err);
> +
> + if (head != n)
> + errx(1, "vringh_getdesc_user: head %i not %i", head, n);
> +
> + assert(riov.max == 7);
> + assert(riov.allocated);
> + err = vringh_iov_pull_user(&riov, buf, 29);
> + assert(err == 28);
> +
> + /* Data should be linear. */
> + for (i = 0; i < err; i++)
> + assert(buf[i] == i);
> + kfree(riov.iov);
> + }
> +
> + /* Don't leak memory... */
> + vring_del_virtqueue(vq);
> + free(__user_addr_min);
> +
> + return 0;
> +}
>
--
Asias
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 4/6] tools/virtio: add vring_test.
2013-01-22 8:25 ` Asias He
@ 2013-01-22 23:03 ` Rusty Russell
2013-01-23 1:40 ` Asias He
0 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2013-01-22 23:03 UTC (permalink / raw)
To: Asias He; +Cc: Michael S . Tsirkin, virtualization
Asias He <asias@redhat.com> writes:
> On 01/17/2013 06:29 PM, Rusty Russell wrote:
>> This is mainly to test the drivers/vhost/vringh.c code, but it also
>> uses the drivers/virtio/virtio_ring.c code for the guest side.
>
> vringh_test.c does not compile here:
> (This series on top of 9a9284153d965a57edc7162a8e57c14c97f3a935)
>
> $ cd tools/virtio
> $ make
> cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
> -fno-strict-overflow -MMD vringh_test.c -o vringh_test
> In file included from ./linux/vringh.h:1:0,
> from ./../../drivers/vhost/vringh.c:6,
> from vringh_test.c:7:
> ./linux/../../../include/linux/vringh.h:27:28: fatal error:
> uapi/linux/uio.h: No such file or directory
Oops, I forgot to add the file... it's a one-liner:
tools/virtio/uapi/linux/uio.h:
#include <sys/uio.h>
I'll make a new branch for this, called vringh. It'll probably rebase
as I neaten things up, but I'll try not to go crazy...
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/6] tools/virtio: add vring_test.
2013-01-22 23:03 ` Rusty Russell
@ 2013-01-23 1:40 ` Asias He
2013-01-24 2:22 ` Rusty Russell
0 siblings, 1 reply; 30+ messages in thread
From: Asias He @ 2013-01-23 1:40 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S . Tsirkin, virtualization
On 01/23/2013 07:03 AM, Rusty Russell wrote:
> Asias He <asias@redhat.com> writes:
>
>> On 01/17/2013 06:29 PM, Rusty Russell wrote:
>>> This is mainly to test the drivers/vhost/vringh.c code, but it also
>>> uses the drivers/virtio/virtio_ring.c code for the guest side.
>>
>> vringh_test.c does not compile here:
>> (This series on top of 9a9284153d965a57edc7162a8e57c14c97f3a935)
>>
>> $ cd tools/virtio
>> $ make
>> cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
>> -fno-strict-overflow -MMD vringh_test.c -o vringh_test
>> In file included from ./linux/vringh.h:1:0,
>> from ./../../drivers/vhost/vringh.c:6,
>> from vringh_test.c:7:
>> ./linux/../../../include/linux/vringh.h:27:28: fatal error:
>> uapi/linux/uio.h: No such file or directory
>
> Oops, I forgot to add the file... it's a one-liner:
>
> tools/virtio/uapi/linux/uio.h:
> #include <sys/uio.h>
>
> I'll make a new branch for this, called vringh. It'll probably rebase
> as I neaten things up, but I'll try not to go crazy...
It compiles now.
FYI, I got some warnings:
../../drivers/virtio/virtio_ring.c: In function ‘virtqueue_kick_prepare’:
../../drivers/virtio/virtio_ring.c:309:3: warning: dereferencing
type-punned pointer will break strict-aliasing rules [-Wstrict-aliasin
g]
cc virtio_test.o virtio_ring.o -o virtio_test
cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
-fno-strict-overflow -MMD vringh_test.c -o vringh_test
In file included from vringh_test.c:7:0:
./../../drivers/vhost/vringh.c: In function ‘check_range’:
./../../drivers/vhost/vringh.c:119:2: warning: format ‘%llx’ expects
argument of type ‘long long unsigned int’, but argument 3 has type
‘u64’ [-Wformat]
./../../drivers/vhost/vringh.c: In function ‘__vringh_notify_enable’:
./../../drivers/vhost/vringh.c:402:3: warning: dereferencing type-punned
pointer will break strict-aliasing rules [-Wstrict-aliasing]
./../../drivers/vhost/vringh.c:405:8: warning: dereferencing type-punned
pointer will break strict-aliasing rules [-Wstrict-aliasing]
./../../drivers/vhost/vringh.c: In function ‘vringh_init_user’:
./../../drivers/vhost/vringh.c:499:3: warning: format ‘%zu’ expects
argument of type ‘size_t’, but argument 2 has type ‘unsigned int’ [
-Wformat]
./../../drivers/vhost/vringh.c: In function ‘vringh_init_kern’:
./../../drivers/vhost/vringh.c:707:3: warning: format ‘%zu’ expects
argument of type ‘size_t’, but argument 2 has type ‘unsigned int’ [
-Wformat]
In file included from vringh_test.c:8:0:
./../../drivers/virtio/virtio_ring.c: In function ‘virtqueue_kick_prepare’:
./../../drivers/virtio/virtio_ring.c:309:3: warning: dereferencing
type-punned pointer will break strict-aliasing rules [-Wstrict-alias
ing]
...
--
Asias
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/6] tools/virtio: add vring_test.
2013-01-23 1:40 ` Asias He
@ 2013-01-24 2:22 ` Rusty Russell
0 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-24 2:22 UTC (permalink / raw)
To: Asias He; +Cc: Michael S . Tsirkin, virtualization
Asias He <asias@redhat.com> writes:
> On 01/23/2013 07:03 AM, Rusty Russell wrote:
>> Asias He <asias@redhat.com> writes:
>>
>>> On 01/17/2013 06:29 PM, Rusty Russell wrote:
>>>> This is mainly to test the drivers/vhost/vringh.c code, but it also
>>>> uses the drivers/virtio/virtio_ring.c code for the guest side.
>>>
>>> vringh_test.c does not compile here:
>>> (This series on top of 9a9284153d965a57edc7162a8e57c14c97f3a935)
>>>
>>> $ cd tools/virtio
>>> $ make
>>> cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
>>> -fno-strict-overflow -MMD vringh_test.c -o vringh_test
>>> In file included from ./linux/vringh.h:1:0,
>>> from ./../../drivers/vhost/vringh.c:6,
>>> from vringh_test.c:7:
>>> ./linux/../../../include/linux/vringh.h:27:28: fatal error:
>>> uapi/linux/uio.h: No such file or directory
>>
>> Oops, I forgot to add the file... it's a one-liner:
>>
>> tools/virtio/uapi/linux/uio.h:
>> #include <sys/uio.h>
>>
>> I'll make a new branch for this, called vringh. It'll probably rebase
>> as I neaten things up, but I'll try not to go crazy...
>
> It compiles now.
>
> FYI, I got some warnings:
>
> ../../drivers/virtio/virtio_ring.c: In function ‘virtqueue_kick_prepare’:
> ../../drivers/virtio/virtio_ring.c:309:3: warning: dereferencing
> type-punned pointer will break strict-aliasing rules [-Wstrict-aliasin
> g]
Yes, we should compile the tests with -fno-strict-aliasing like the
kernel does, and I should fix the printk formatting for 64-bit
targets...
vringh: fix warnings.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 3bbdefe..734ba8c 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -111,7 +111,8 @@ static inline bool range_check(u64 addr, u32 *len,
/* Otherwise, don't wrap. */
if (addr + *len < addr) {
- vringh_bad("Wrapping descriptor %u@0x%llx", *len, addr);
+ vringh_bad("Wrapping descriptor %u@0x%llx",
+ *len, (unsigned long long)addr);
return false;
}
@@ -570,7 +571,7 @@ int vringh_init_user(struct vringh *vrh, u32 features,
{
/* Sane power of 2 please! */
if (!num || num > 0xffff || (num & (num - 1))) {
- vringh_bad("Bad ring size %zu", num);
+ vringh_bad("Bad ring size %u", num);
return -EINVAL;
}
@@ -800,7 +801,7 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
{
/* Sane power of 2 please! */
if (!num || num > 0xffff || (num & (num - 1))) {
- vringh_bad("Bad ring size %zu", num);
+ vringh_bad("Bad ring size %u", num);
return -EINVAL;
}
diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index b928c3e..e961545 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -1,7 +1,7 @@
all: test mod
test: virtio_test vringh_test
virtio_test: virtio_ring.o virtio_test.o
-CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -MMD
+CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -Wno-unused-result -MMD
vpath %.c ../../drivers/virtio
mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 5/6] vringh: separate callback for notification.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
` (2 preceding siblings ...)
2013-01-17 10:29 ` [PATCH 4/6] tools/virtio: add vring_test Rusty Russell
@ 2013-01-17 10:29 ` Rusty Russell
2013-01-17 10:29 ` [PATCH 6/6] tools/virtio: adapt for API changes Rusty Russell
` (3 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-17 10:29 UTC (permalink / raw)
To: virtualization; +Cc: Michael S . Tsirkin
This makes it possible to batch notifications, and the fewer barriers
helps performance:
Before: (./vringh_test --eventidx --parallel):
Using CPUS 0 and 3
Guest: notified 156428, pinged 156251
Host: notified 156251, pinged 78215
R=4.575 U=3.520 S=3.520
Using CPUS 0 and 3
Guest: notified 156617, pinged 156250
Host: notified 156250, pinged 78310
R=4.511 U=3.580 S=3.580
Using CPUS 0 and 3
Guest: notified 156357, pinged 156251
Host: notified 156251, pinged 78179
R=4.518 U=3.536 S=3.536
After:
Using CPUS 0 and 3
Guest: notified 156394, pinged 156251
Host: notified 156251, pinged 78197
R=4.282 U=3.456 S=3.456
Using CPUS 0 and 3
Guest: notified 156578, pinged 156251
Host: notified 156251, pinged 78289
R=4.248 U=3.436 S=3.436
Using CPUS 0 and 3
Guest: notified 156594, pinged 156251
Host: notified 156251, pinged 78297
R=4.329 U=3.416 S=3.416
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
drivers/vhost/vringh.c | 89 +++++++++++++++++++++++++++++++++---------------
include/linux/vringh.h | 15 +++++---
2 files changed, 72 insertions(+), 32 deletions(-)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index b28670f..ab10da8 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -290,13 +290,12 @@ static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
int (*putu16)(u16 *p, u16 val),
int (*putused)(struct vring_used_elem *dst,
const struct vring_used_elem
- *s),
- bool *notify)
+ *s))
{
struct vring_used_elem used;
struct vring_used *used_ring;
int err;
- u16 used_idx, old, used_event;
+ u16 used_idx;
used.id = idx;
used.len = len;
@@ -309,7 +308,7 @@ static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
}
used_ring = vrh->vring.used;
- used_idx = vrh->last_used_idx;
+ used_idx = vrh->last_used_idx + vrh->completed;
err = putused(&used_ring->ring[used_idx % vrh->vring.num],
&used);
@@ -323,19 +322,24 @@ static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
/* Make sure buffer is written before we update index. */
virtio_wmb(vrh->weak_barriers);
- old = vrh->last_used_idx;
- vrh->last_used_idx++;
-
- err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
+ err = putu16(&vrh->vring.used->idx, used_idx + 1);
if (err) {
vringh_bad("Failed to update used index at %p",
&vrh->vring.used->idx);
return err;
}
- /* If we already know we need to notify, skip re-checking */
- if (*notify)
- return 0;
+ vrh->completed++;
+ return 0;
+}
+
+
+static inline int __vringh_need_notify(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p))
+{
+ bool notify;
+ u16 used_event;
+ int err;
/* Flush out used index update. This is paired with the
* barrier that the Guest executes when enabling
@@ -351,21 +355,28 @@ static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
&vrh->vring.avail->flags);
return err;
}
- if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
- *notify = true;
- return 0;
+ return (!(flags & VRING_AVAIL_F_NO_INTERRUPT));
}
- /* Modern: we know where other side is up to. */
+ /* Modern: we know when other side wants to know. */
err = getu16(&used_event, &vring_used_event(&vrh->vring));
if (err) {
vringh_bad("Failed to get used event idx at %p",
&vring_used_event(&vrh->vring));
return err;
}
- if (vring_need_event(used_event, vrh->last_used_idx, old))
- *notify = true;
- return 0;
+
+ /* Just in case we added so many that we wrap. */
+ if (unlikely(vrh->completed > 0xffff))
+ notify = true;
+ else
+ notify = vring_need_event(used_event,
+ vrh->last_used_idx + vrh->completed,
+ vrh->last_used_idx);
+
+ vrh->last_used_idx += vrh->completed;
+ vrh->completed = 0;
+ return notify;
}
static inline bool __vringh_notify_enable(struct vringh *vrh,
@@ -587,14 +598,14 @@ void vringh_abandon_user(struct vringh *vrh, unsigned int num)
* @vrh: the vring.
* @head: the head as filled in by vringh_getdesc_user.
* @len: the length of data we have written.
- * @notify: set if we should notify the other side, otherwise left alone.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
*/
-int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
- bool *notify)
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len)
{
return __vringh_complete(vrh, head, len,
- getu16_user, putu16_user, putused_user,
- notify);
+ getu16_user, putu16_user, putused_user);
}
/**
@@ -621,6 +632,17 @@ void vringh_notify_disable_user(struct vringh *vrh)
__vringh_notify_disable(vrh, putu16_user);
}
+/**
+ * vringh_need_notify_user - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_user() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_user(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_user);
+}
+
/* Kernelspace access helpers. */
static inline int getu16_kern(u16 *val, const u16 *p)
{
@@ -783,14 +805,14 @@ void vringh_abandon_kern(struct vringh *vrh, unsigned int num)
* @vrh: the vring.
* @head: the head as filled in by vringh_getdesc_kern.
* @len: the length of data we have written.
- * @notify: set if we should notify the other side, otherwise left alone.
+ *
+ * You should check vringh_need_notify_kern() after one or more calls
+ * to this function.
*/
-int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len,
- bool *notify)
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len)
{
return __vringh_complete(vrh, head, len,
- getu16_kern, putu16_kern, putused_kern,
- notify);
+ getu16_kern, putu16_kern, putused_kern);
}
/**
@@ -816,3 +838,14 @@ void vringh_notify_disable_kern(struct vringh *vrh)
{
__vringh_notify_disable(vrh, putu16_kern);
}
+
+/**
+ * vringh_need_notify_kern - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_kern() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_kern(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_kern);
+}
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 508b5e5..9df86e9 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -44,6 +44,9 @@ struct vringh {
/* Last index we used. */
u16 last_used_idx;
+ /* How many descriptors we've completed since last need_notify(). */
+ u32 completed;
+
/* The vring (note: it may contain user pointers!) */
struct vring vring;
};
@@ -83,13 +86,15 @@ ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
const void *src, size_t len);
-/* Mark a descriptor as used. Sets notify if you should fire eventfd. */
-int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
- bool *notify);
+/* Mark a descriptor as used. */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len);
/* Pretend we've never seen descriptor (for easy error handling). */
void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+/* Do we need to fire the eventfd to notify the other side? */
+int vringh_need_notify_user(struct vringh *vrh);
+
/* Helpers for kernelspace vrings. */
int vringh_init_kern(struct vringh *vrh, u32 features,
unsigned int num, bool weak_barriers,
@@ -107,9 +112,11 @@ ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len);
ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
const void *src, size_t len);
void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
-int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len, bool *notify);
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len);
bool vringh_notify_enable_kern(struct vringh *vrh);
void vringh_notify_disable_kern(struct vringh *vrh);
+int vringh_need_notify_kern(struct vringh *vrh);
+
#endif /* _LINUX_VRINGH_H */
--
1.7.10.4
^ permalink raw reply related [flat|nested] 30+ messages in thread* [PATCH 6/6] tools/virtio: adapt for API changes.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
` (3 preceding siblings ...)
2013-01-17 10:29 ` [PATCH 5/6] vringh: separate callback for notification Rusty Russell
@ 2013-01-17 10:29 ` Rusty Russell
2013-01-17 11:23 ` [PATCH 1/6] virtio_host: host-side implementation of virtio rings Michael S. Tsirkin
` (2 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-17 10:29 UTC (permalink / raw)
To: virtualization; +Cc: Michael S . Tsirkin
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
tools/virtio/vringh_test.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
index f3868f4..df09a3f 100644
--- a/tools/virtio/vringh_test.c
+++ b/tools/virtio/vringh_test.c
@@ -109,8 +109,7 @@ static int parallel_test(unsigned long features)
if (fork() != 0) {
struct vringh vrh;
- bool notify = false;
- int status;
+ int status, err;
/* We are the host: never access guest addresses! */
munmap(guest_map, mapsize);
@@ -128,7 +127,7 @@ static int parallel_test(unsigned long features)
vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
CPU_SET(first_cpu, &cpu_set);
if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
- err(1, "Could not set affinity to cpu %u", first_cpu);
+ errx(1, "Could not set affinity to cpu %u", first_cpu);
while (xfers < NUM_XFERS) {
struct iovec host_riov[2], host_wiov[2];
@@ -150,10 +149,13 @@ static int parallel_test(unsigned long features)
if (err == 0) {
char buf[128];
- if (notify) {
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i",
+ err);
+ if (err) {
write(to_guest[1], "", 1);
notifies++;
- notify = false;
}
if (vringh_notify_enable_user(&vrh))
@@ -180,15 +182,17 @@ static int parallel_test(unsigned long features)
xfers++;
assert(wiov.i == wiov.max);
- err = vringh_complete_user(&vrh, head, rlen, ¬ify);
+ err = vringh_complete_user(&vrh, head, rlen);
if (err != 0)
errx(1, "vringh_complete_user: %i", err);
}
- if (notify) {
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i", err);
+ if (err) {
write(to_guest[1], "", 1);
notifies++;
- notify = false;
}
wait(&status);
if (!WIFEXITED(status))
@@ -344,7 +348,6 @@ int main(int argc, char *argv[])
u16 head;
int err;
unsigned i;
- bool notify = false;
void *ret;
vdev.features[0] = 0;
@@ -435,7 +438,7 @@ int main(int argc, char *argv[])
assert(vringh_iov_push_kern(&wiov, buf, 5) == 0);
/* Host is done. */
- err = vringh_complete_user(&vrh, head, err, ¬ify);
+ err = vringh_complete_user(&vrh, head, err);
if (err != 0)
errx(1, "vringh_complete_user: %i", err);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
` (4 preceding siblings ...)
2013-01-17 10:29 ` [PATCH 6/6] tools/virtio: adapt for API changes Rusty Russell
@ 2013-01-17 11:23 ` Michael S. Tsirkin
2013-01-17 11:49 ` Sjur Brændeland
2013-01-21 2:34 ` Rusty Russell
2013-01-22 8:12 ` Asias He
2013-02-04 20:29 ` Sjur Brændeland
7 siblings, 2 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2013-01-17 11:23 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization
On Thu, Jan 17, 2013 at 08:59:38PM +1030, Rusty Russell wrote:
> Getting use of virtio rings correct is tricky, and a recent patch saw
> an implementation of in-kernel rings (as separate from userspace).
>
> This patch attempts to abstract the business of dealing with the
> virtio ring layout from the access (userspace or direct); to do this,
> we use function pointers, which gcc inlines correctly.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
> drivers/Makefile | 2 +-
> drivers/vhost/Kconfig | 8 +
> drivers/vhost/Kconfig.tcm | 1 +
> drivers/vhost/Makefile | 2 +
> drivers/vhost/vringh.c | 818 ++++++++++++++++++++++++++++++++++++++++++
> drivers/virtio/virtio_ring.c | 33 +-
> include/linux/virtio_ring.h | 57 +++
> include/linux/vringh.h | 115 ++++++
> 8 files changed, 1008 insertions(+), 28 deletions(-)
> create mode 100644 drivers/vhost/vringh.c
> create mode 100644 include/linux/vringh.h
>
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 7863b9f..351a34f 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -123,7 +123,7 @@ obj-$(CONFIG_PPC_PS3) += ps3/
> obj-$(CONFIG_OF) += of/
> obj-$(CONFIG_SSB) += ssb/
> obj-$(CONFIG_BCMA) += bcma/
> -obj-$(CONFIG_VHOST_NET) += vhost/
> +obj-$(CONFIG_VHOST_RING) += vhost/
> obj-$(CONFIG_VLYNQ) += vlynq/
> obj-$(CONFIG_STAGING) += staging/
> obj-y += platform/
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index 202bba6..613b074 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -1,6 +1,7 @@
> config VHOST_NET
> tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
> depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
> + select VHOST_RING
> ---help---
> This kernel module can be loaded in host kernel to accelerate
> guest networking with virtio_net. Not to be confused with virtio_net
> @@ -12,3 +13,10 @@ config VHOST_NET
> if STAGING
> source "drivers/vhost/Kconfig.tcm"
> endif
> +
> +config VHOST_RING
> + tristate
> + ---help---
> + This option is selected by any driver which needs to access
> + the host side of a virtio ring.
> +
> diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
> index a9c6f76..0218f77 100644
> --- a/drivers/vhost/Kconfig.tcm
> +++ b/drivers/vhost/Kconfig.tcm
> @@ -1,6 +1,7 @@
> config TCM_VHOST
> tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
> depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
> + select VHOST_RING
> default n
> ---help---
> Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
> diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> index a27b053..1d37f5e 100644
> --- a/drivers/vhost/Makefile
> +++ b/drivers/vhost/Makefile
> @@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
> vhost_net-y := vhost.o net.o
>
> obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
> +
> +obj-$(CONFIG_VHOST_RING) += vringh.o
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> new file mode 100644
> index 0000000..b28670f
> --- /dev/null
> +++ b/drivers/vhost/vringh.c
> @@ -0,0 +1,818 @@
> +/*
> + * Helpers for the host side of a virtio ring.
> + *
> + * Since these may be in userspace, we use (inline) accessors.
> + */
> +#include <linux/vringh.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/kernel.h>
> +#include <linux/ratelimit.h>
> +#include <linux/uaccess.h>
> +#include <linux/slab.h>
> +
> +static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
> +{
> + static DEFINE_RATELIMIT_STATE(vringh_rs,
> + DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> + if (__ratelimit(&vringh_rs)) {
> + va_list ap;
> + va_start(ap, fmt);
> + printk(KERN_NOTICE "vringh:");
> + vprintk(fmt, ap);
> + va_end(ap);
> + }
> +}
> +
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
I think int (*getu16)(const u16 *p) would be cleaner
than returning through a pointer, then
callers check that value < 0 for error.
> + u16 *last_avail_idx)
> +{
> + u16 avail_idx, i, head;
> + int err;
> +
> + err = getu16(&avail_idx, &vrh->vring.avail->idx);
> + if (err) {
> + vringh_bad("Failed to access avail idx at %p",
> + &vrh->vring.avail->idx);
> + return err;
> + }
> +
> + if (*last_avail_idx == avail_idx)
> + return vrh->vring.num;
> +
> + /* Only get avail ring entries after they have been exposed by guest. */
> + virtio_rmb(vrh->weak_barriers);
> +
> + i = *last_avail_idx & (vrh->vring.num - 1);
> +
> + err = getu16(&head, &vrh->vring.avail->ring[i]);
> + if (err) {
> + vringh_bad("Failed to read head: idx %d address %p",
> + *last_avail_idx, &vrh->vring.avail->ring[i]);
> + return err;
> + }
> +
> + if (head >= vrh->vring.num) {
> + vringh_bad("Guest says index %u > %u is available",
> + head, vrh->vring.num);
> + return -EINVAL;
> + }
> +
> + (*last_avail_idx)++;
> + return head;
> +}
> +
> +/* Copy some bytes to/from the iovec. Returns num copied. */
> +static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
> + void *ptr, size_t len,
> + int (*xfer)(void __user *addr, void *ptr,
> + size_t len))
> +{
> + int err, done = 0;
> +
> + while (len && iov->i < iov->max) {
> + size_t partlen;
> +
> + partlen = min(iov->iov[iov->i].iov_len, len);
> + err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
> + if (err)
> + return err;
> + done += partlen;
> + len -= partlen;
> + ptr += partlen;
> + iov->iov[iov->i].iov_base += partlen;
> + iov->iov[iov->i].iov_len -= partlen;
> +
> + if (iov->iov[iov->i].iov_len == 0)
> + iov->i++;
> + }
> + return done;
> +}
> +
> +static inline bool check_range(u64 addr, u32 len,
> + struct vringh_range *range,
> + bool (*getrange)(u64, struct vringh_range *))
> +{
> + if (addr < range->start || addr > range->end_incl) {
> + if (!getrange(addr, range))
> + goto bad;
> + }
> + BUG_ON(addr < range->start || addr > range->end_incl);
> +
> + /* To end of memory? */
> + if (unlikely(addr + len == 0)) {
> + if (range->end_incl == -1ULL)
> + return true;
> + goto bad;
> + }
> +
> + /* Otherwise, don't wrap. */
> + if (unlikely(addr + len < addr))
> + goto bad;
> + if (unlikely(addr + len - 1 > range->end_incl))
> + goto bad;
> + return true;
> +
> +bad:
> + vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
> + return false;
> +}
> +
> +/* No reason for this code to be inline. */
> +static int move_to_indirect(int *up_next, u16 *i, void *addr,
> + const struct vring_desc *desc,
> + struct vring_desc **descs, int *desc_max)
> +{
> + /* Indirect tables can't have indirect. */
> + if (*up_next != -1) {
> + vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
> + return -EINVAL;
> + }
> +
> + if (unlikely(desc->len % sizeof(struct vring_desc))) {
> + vringh_bad("Strange indirect len %u", desc->len);
> + return -EINVAL;
> + }
> +
> + /* We will check this when we follow it! */
> + if (desc->flags & VRING_DESC_F_NEXT)
> + *up_next = desc->next;
> + else
> + *up_next = -2;
> + *descs = addr;
> + *desc_max = desc->len / sizeof(struct vring_desc);
> +
> + /* Now, start at the first indirect. */
> + *i = 0;
> + return 0;
> +}
> +
> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> +{
> + struct iovec *new;
> + unsigned int new_num = iov->max * 2;
> +
> + if (new_num < 8)
> + new_num = 8;
> +
> + if (iov->allocated)
> + new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
> + else {
> + new = kmalloc(new_num * sizeof(struct iovec), gfp);
> + if (new) {
> + memcpy(new, iov->iov, iov->i * sizeof(struct iovec));
> + iov->allocated = true;
> + }
> + }
> + if (!new)
> + return -ENOMEM;
> + iov->iov = new;
> + iov->max = new_num;
> + return 0;
> +}
> +
> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
> + struct vring_desc **descs, int *desc_max)
> +{
> + u16 i = *up_next;
> +
> + *up_next = -1;
> + *descs = vrh->vring.desc;
> + *desc_max = vrh->vring.num;
> + return i;
> +}
> +
> +static inline int
> +__vringh_iov(struct vringh *vrh, u16 i,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + gfp_t gfp,
> + int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> +{
> + int err, count = 0, up_next, desc_max;
> + struct vring_desc desc, *descs;
> + struct vringh_range range = { -1ULL, 0 };
> +
> + /* We start traversing vring's descriptor table. */
> + descs = vrh->vring.desc;
> + desc_max = vrh->vring.num;
> + up_next = -1;
> +
> + riov->i = wiov->i = 0;
> + for (;;) {
> + void *addr;
> + struct vringh_iov *iov;
> +
> + err = getdesc(&desc, &descs[i]);
> + if (unlikely(err))
> + goto fail;
> +
> + /* Make sure it's OK, and get offset. */
> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
> + err = -EINVAL;
> + goto fail;
> + }
> + addr = (void *)(long)desc.addr + range.offset;
Should probably be (void *)(long)(desc.addr + range.offset).
Otherwise we risk signed integer overflow.
> +
> + if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
> + err = move_to_indirect(&up_next, &i, addr, &desc,
> + &descs, &desc_max);
> + if (err)
> + goto fail;
> + continue;
> + }
> +
> + if (count++ == vrh->vring.num) {
> + vringh_bad("Descriptor loop in %p", descs);
> + err = -ELOOP;
> + goto fail;
> + }
> +
> + if (desc.flags & VRING_DESC_F_WRITE)
> + iov = wiov;
> + else {
> + iov = riov;
> + if (unlikely(wiov->i)) {
> + vringh_bad("Readable desc %p after writable",
> + &descs[i]);
> + err = -EINVAL;
> + goto fail;
> + }
> + }
> +
> + if (unlikely(iov->i == iov->max)) {
> + err = resize_iovec(iov, gfp);
> + if (err)
> + goto fail;
> + }
> +
> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> + iov->iov[iov->i].iov_len = desc.len;
The following comment from the previous version still applies:
> This looks like it won't do the right thing if desc.len spans multiple
> ranges. I don't know if this happens in practice but this is something
> vhost supports ATM.
in otgher words, we might need to split a single desc to multiple
iov entries.
> + iov->i++;
> +
> + if (desc.flags & VRING_DESC_F_NEXT) {
> + i = desc.next;
> + } else {
> + /* Just in case we need to finish traversing above. */
> + if (unlikely(up_next > 0))
> + i = return_from_indirect(vrh, &up_next,
> + &descs, &desc_max);
> + else
> + break;
> + }
> +
> + if (i >= desc_max) {
> + vringh_bad("Chained index %u > %u", i, desc_max);
> + err = -EINVAL;
> + goto fail;
> + }
> + }
> +
> + /* Reset for fresh iteration. */
> + riov->max = riov->i;
> + wiov->max = wiov->i;
> + riov->i = wiov->i = 0;
> + return 0;
> +
> +fail:
> + if (riov->allocated)
> + kfree(riov->iov);
> + if (wiov->allocated)
> + kfree(wiov->iov);
> + return err;
> +}
> +
> +static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
> + int (*getu16)(u16 *val, const u16 *p),
> + int (*putu16)(u16 *p, u16 val),
> + int (*putused)(struct vring_used_elem *dst,
> + const struct vring_used_elem
> + *s),
> + bool *notify)
> +{
> + struct vring_used_elem used;
> + struct vring_used *used_ring;
> + int err;
> + u16 used_idx, old, used_event;
> +
> + used.id = idx;
> + used.len = len;
> +
> + err = getu16(&used_idx, &vring_used_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to access used event %p",
> + &vring_used_event(&vrh->vring));
> + return err;
> + }
> +
> + used_ring = vrh->vring.used;
> + used_idx = vrh->last_used_idx;
> +
> + err = putused(&used_ring->ring[used_idx % vrh->vring.num],
> + &used);
> + if (err) {
> + vringh_bad("Failed to write used entry %u at %p",
> + used_idx % vrh->vring.num,
> + &used_ring->ring[used_idx % vrh->vring.num]);
> + return err;
> + }
> +
> + /* Make sure buffer is written before we update index. */
> + virtio_wmb(vrh->weak_barriers);
> +
> + old = vrh->last_used_idx;
> + vrh->last_used_idx++;
> +
> + err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
> + if (err) {
> + vringh_bad("Failed to update used index at %p",
> + &vrh->vring.used->idx);
> + return err;
One thing vhost does is roll back everything on error,
so you can for example have an invalid range
of memory and handle writes there in userspace.
I think it's worth preserving though this is
currently unused.
> + }
> +
> + /* If we already know we need to notify, skip re-checking */
> + if (*notify)
> + return 0;
> +
> + /* Flush out used index update. This is paired with the
> + * barrier that the Guest executes when enabling
> + * interrupts. */
> + virtio_mb(vrh->weak_barriers);
> +
> + /* Old-style, without event indices. */
> + if (!vrh->event_indices) {
> + u16 flags;
> + err = getu16(&flags, &vrh->vring.avail->flags);
> + if (err) {
> + vringh_bad("Failed to get flags at %p",
> + &vrh->vring.avail->flags);
> + return err;
> + }
> + if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
> + *notify = true;
> + return 0;
> + }
> +
> + /* Modern: we know where other side is up to. */
> + err = getu16(&used_event, &vring_used_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to get used event idx at %p",
> + &vring_used_event(&vrh->vring));
> + return err;
> + }
> + if (vring_need_event(used_event, vrh->last_used_idx, old))
> + *notify = true;
> + return 0;
> +}
> +
> +static inline bool __vringh_notify_enable(struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
> + int (*putu16)(u16 *p, u16 val))
> +{
> + u16 avail;
> +
> + /* Already enabled? */
> + if (vrh->listening)
> + return false;
> +
> + vrh->listening = true;
> +
> + if (!vrh->event_indices) {
> + /* Old-school; update flags. */
> + if (putu16(&vrh->vring.used->flags, 0) != 0) {
> + vringh_bad("Clearing used flags %p",
> + &vrh->vring.used->flags);
> + return false;
> + }
> + } else {
> + if (putu16(&vring_avail_event(&vrh->vring),
> + vrh->last_avail_idx) != 0) {
> + vringh_bad("Updating avail event index %p",
> + &vring_avail_event(&vrh->vring));
> + return false;
> + }
> + }
> +
> + /* They could have slipped one in as we were doing that: make
> + * sure it's written, then check again. */
> + virtio_mb(vrh->weak_barriers);
> +
> + if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
Hmm above has implicit != 0 why not here?
> + vringh_bad("Failed to check avail idx at %p",
> + &vrh->vring.avail->idx);
> + return false;
> + }
> +
> + /* This is so unlikely, we just leave notifications enabled. */
> + return avail != vrh->last_avail_idx;
> +}
> +
> +static inline void __vringh_notify_disable(struct vringh *vrh,
> + int (*putu16)(u16 *p, u16 val))
> +{
> + /* Already disabled? */
> + if (!vrh->listening)
> + return;
> +
> + vrh->listening = false;
> + if (!vrh->event_indices) {
> + /* Old-school; update flags. */
> + if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
> + vringh_bad("Setting used flags %p",
> + &vrh->vring.used->flags);
> + }
> + }
> +}
> +
> +/* Userspace access helpers. */
> +static inline int getu16_user(u16 *val, const u16 *p)
> +{
> + return get_user(*val, (__force u16 __user *)p);
> +}
> +
> +static inline int putu16_user(u16 *p, u16 val)
> +{
> + return put_user(val, (__force u16 __user *)p);
> +}
> +
> +static inline int getdesc_user(struct vring_desc *dst,
> + const struct vring_desc *src)
> +{
> + return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
> + -EFAULT;
confused about __force above. Shouldn't it cast to __user?
I have not tried does this patch pass the checker?
> +}
> +
> +static inline int putused_user(struct vring_used_elem *dst,
> + const struct vring_used_elem *s)
> +{
> + return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
> + ? 0 : -EFAULT;
> +}
> +
> +static inline int xfer_from_user(void *src, void *dst, size_t len)
> +{
> + return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +static inline int xfer_to_user(void *dst, void *src, size_t len)
> +{
> + return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +/**
> + * vringh_init_user - initialize a vringh for a userspace vring.
> + * @vrh: the vringh to initialize.
> + * @features: the feature bits for this ring.
> + * @num: the number of elements.
> + * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @desc: the userpace descriptor pointer.
> + * @avail: the userpace avail pointer.
> + * @used: the userpace used pointer.
> + *
> + * Returns an error if num is invalid: you should check pointers
> + * yourself!
> + */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used)
> +{
> + /* Sane power of 2 please! */
> + if (!num || num > 0xffff || (num & (num - 1))) {
> + vringh_bad("Bad ring size %zu", num);
> + return -EINVAL;
> + }
> +
> + vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
> + vrh->weak_barriers = weak_barriers;
> + vrh->listening = false;
> + vrh->last_avail_idx = 0;
> + vrh->last_used_idx = 0;
> + vrh->vring.num = num;
> + vrh->vring.desc = (__force struct vring_desc *)desc;
> + vrh->vring.avail = (__force struct vring_avail *)avail;
> + vrh->vring.used = (__force struct vring_used *)used;
I counted 3 separate chunks that do __force casts.
Let's try to isolate them and comment why it's safe.
> + return 0;
> +}
> +
> +/**
> + * vringh_getdesc_user - get next available descriptor from userspace ring.
> + * @vrh: the userspace vring.
> + * @riov: where to put the readable descriptors.
> + * @wiov: where to put the writable descriptors.
> + * @getrange: function to call to check ranges.
> + * @head: head index we received, for passing to vringh_complete_user().
> + * @gfp: flags for allocating larger riov/wiov.
> + *
> + * Returns 0 if there was no descriptor, 1 if there was, or -errno.
> + *
> + * If it returns 1, riov->allocated and wiov->allocated indicate if you
> + * have to kfree riov->iov and wiov->iov respectively.
> + */
> +int vringh_getdesc_user(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + u16 *head,
> + gfp_t gfp)
> +{
> + int err;
> +
> + err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
> + if (err < 0)
> + return err;
> +
> + /* Empty... */
> + if (err == vrh->vring.num)
> + return 0;
> +
> + *head = err;
> + err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
> + if (err)
> + return err;
> +
> + return 1;
> +}
> +
> +/**
> + * vringh_iov_pull_user - copy bytes from vring_iov.
> + * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
> +{
> + return vringh_iov_xfer(riov, dst, len, xfer_from_user);
> +}
> +
> +/**
> + * vringh_iov_push_user - copy bytes into vring_iov.
> + * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len)
> +{
> + return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
> +}
> +
> +/**
> + * vringh_abandon_user - we've decided not to handle the descriptor(s).
> + * @vrh: the vring.
> + * @num: the number of descriptors to put back (ie. num
> + * vringh_get_user() to undo).
> + *
> + * The next vringh_get_user() will return the old descriptor(s) again.
> + */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num)
> +{
> + /* We only update vring_avail_event(vr) when we want to be notified,
> + * so we haven't changed that yet. */
> + vrh->last_avail_idx -= num;
> +}
> +
> +/**
> + * vringh_complete_user - we've finished with descriptor, publish it.
> + * @vrh: the vring.
> + * @head: the head as filled in by vringh_getdesc_user.
> + * @len: the length of data we have written.
> + * @notify: set if we should notify the other side, otherwise left alone.
> + */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> + bool *notify)
> +{
> + return __vringh_complete(vrh, head, len,
> + getu16_user, putu16_user, putused_user,
> + notify);
> +}
> +
> +/**
> + * vringh_notify_enable_user - we want to know if something changes.
> + * @vrh: the vring.
> + *
> + * This always enables notifications, but returns true if there are
> + * now more buffers available in the vring.
> + */
> +bool vringh_notify_enable_user(struct vringh *vrh)
> +{
> + return __vringh_notify_enable(vrh, getu16_user, putu16_user);
> +}
> +
> +/**
> + * vringh_notify_disable_user - don't tell us if something changes.
> + * @vrh: the vring.
> + *
> + * This is our normal running state: we disable and then only enable when
> + * we're going to sleep.
> + */
> +void vringh_notify_disable_user(struct vringh *vrh)
> +{
> + __vringh_notify_disable(vrh, putu16_user);
> +}
> +
> +/* Kernelspace access helpers. */
> +static inline int getu16_kern(u16 *val, const u16 *p)
> +{
> + *val = *p;
> + return 0;
> +}
> +
> +static inline int putu16_kern(u16 *p, u16 val)
> +{
> + *p = val;
> + return 0;
> +}
> +
> +static inline int getdesc_kern(struct vring_desc *dst,
> + const struct vring_desc *src)
> +{
> + *dst = *src;
> + return 0;
> +}
> +
> +static inline int putused_kern(struct vring_used_elem *dst,
> + const struct vring_used_elem *s)
> +{
> + *dst = *s;
> + return 0;
> +}
> +
> +static inline int xfer_kern(void *src, void *dst, size_t len)
> +{
> + memcpy(dst, src, len);
> + return 0;
> +}
> +
> +static inline bool noop_getrange(u64 addr, struct vringh_range *r)
> +{
> + r->start = 0;
> + r->end_incl = -1ULL;
> + r->offset = 0;
> + return true;
> +}
> +
> +/**
> + * vringh_init_kern - initialize a vringh for a kernelspace vring.
> + * @vrh: the vringh to initialize.
> + * @features: the feature bits for this ring.
> + * @num: the number of elements.
> + * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @desc: the userpace descriptor pointer.
> + * @avail: the userpace avail pointer.
> + * @used: the userpace used pointer.
> + *
> + * Returns an error if num is invalid.
> + */
> +int vringh_init_kern(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc *desc,
> + struct vring_avail *avail,
> + struct vring_used *used)
> +{
> + /* Sane power of 2 please! */
> + if (!num || num > 0xffff || (num & (num - 1))) {
> + vringh_bad("Bad ring size %zu", num);
> + return -EINVAL;
> + }
> +
> + vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
> + vrh->weak_barriers = weak_barriers;
> + vrh->listening = false;
> + vrh->last_avail_idx = 0;
> + vrh->last_used_idx = 0;
> + vrh->vring.num = num;
> + vrh->vring.desc = desc;
> + vrh->vring.avail = avail;
> + vrh->vring.used = used;
> + return 0;
> +}
> +
> +/**
> + * vringh_getdesc_kern - get next available descriptor from kernelspace ring.
> + * @vrh: the kernelspace vring.
> + * @riov: where to put the readable descriptors.
> + * @wiov: where to put the writable descriptors.
> + * @head: head index we received, for passing to vringh_complete_kern().
> + * @gfp: flags for allocating larger riov/wiov.
> + *
> + * Returns 0 if there was no descriptor, 1 if there was, or -errno.
> + *
> + * If it returns 1, riov->allocated and wiov->allocated indicate if you
> + * have to kfree riov->iov and wiov->iov respectively.
> + */
> +int vringh_getdesc_kern(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + u16 *head,
> + gfp_t gfp)
> +{
> + int err;
> +
> + err = __vringh_get_head(vrh, getu16_kern, &vrh->last_avail_idx);
> + if (err < 0)
> + return err;
> +
> + /* Empty... */
> + if (err == vrh->vring.num)
> + return 0;
> +
> + *head = err;
> + err = __vringh_iov(vrh, *head, riov, wiov, noop_getrange,
> + gfp, getdesc_kern);
> + if (err)
> + return err;
> +
> + return 1;
> +}
> +
> +/**
> + * vringh_iov_pull_kern - copy bytes from vring_iov.
> + * @riov: the riov as passed to vringh_getdesc_kern() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len)
> +{
> + return vringh_iov_xfer(riov, dst, len, xfer_kern);
> +}
> +
> +/**
> + * vringh_iov_push_kern - copy bytes into vring_iov.
> + * @wiov: the wiov as passed to vringh_getdesc_kern() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_push_kern(struct vringh_iov *wiov,
> + const void *src, size_t len)
> +{
> + return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
> +}
> +
> +/**
> + * vringh_abandon_kern - we've decided not to handle the descriptor(s).
> + * @vrh: the vring.
> + * @num: the number of descriptors to put back (ie. num
> + * vringh_get_kern() to undo).
> + *
> + * The next vringh_get_kern() will return the old descriptor(s) again.
> + */
> +void vringh_abandon_kern(struct vringh *vrh, unsigned int num)
> +{
> + /* We only update vring_avail_event(vr) when we want to be notified,
> + * so we haven't changed that yet. */
> + vrh->last_avail_idx -= num;
> +}
> +
> +/**
> + * vringh_complete_kern - we've finished with descriptor, publish it.
> + * @vrh: the vring.
> + * @head: the head as filled in by vringh_getdesc_kern.
> + * @len: the length of data we have written.
> + * @notify: set if we should notify the other side, otherwise left alone.
> + */
> +int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len,
> + bool *notify)
> +{
> + return __vringh_complete(vrh, head, len,
> + getu16_kern, putu16_kern, putused_kern,
> + notify);
> +}
> +
> +/**
> + * vringh_notify_enable_kern - we want to know if something changes.
> + * @vrh: the vring.
> + *
> + * This always enables notifications, but returns true if there are
> + * now more buffers available in the vring.
> + */
> +bool vringh_notify_enable_kern(struct vringh *vrh)
> +{
> + return __vringh_notify_enable(vrh, getu16_kern, putu16_kern);
> +}
> +
> +/**
> + * vringh_notify_disable_kern - don't tell us if something changes.
> + * @vrh: the vring.
> + *
> + * This is our normal running state: we disable and then only enable when
> + * we're going to sleep.
> + */
> +void vringh_notify_disable_kern(struct vringh *vrh)
> +{
> + __vringh_notify_disable(vrh, putu16_kern);
> +}
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index ffd7e7d..245177c 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -24,27 +24,6 @@
> #include <linux/module.h>
> #include <linux/hrtimer.h>
>
> -/* virtio guest is communicating with a virtual "device" that actually runs on
> - * a host processor. Memory barriers are used to control SMP effects. */
> -#ifdef CONFIG_SMP
> -/* Where possible, use SMP barriers which are more lightweight than mandatory
> - * barriers, because mandatory barriers control MMIO effects on accesses
> - * through relaxed memory I/O windows (which virtio-pci does not use). */
> -#define virtio_mb(vq) \
> - do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
> -#define virtio_rmb(vq) \
> - do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
> -#define virtio_wmb(vq) \
> - do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
> -#else
> -/* We must force memory ordering even if guest is UP since host could be
> - * running on another CPU, but SMP barriers are defined to barrier() in that
> - * configuration. So fall back to mandatory barriers instead. */
> -#define virtio_mb(vq) mb()
> -#define virtio_rmb(vq) rmb()
> -#define virtio_wmb(vq) wmb()
> -#endif
> -
> #ifdef DEBUG
> /* For development, we want to crash whenever the ring is screwed. */
> #define BAD_RING(_vq, fmt, args...) \
> @@ -276,7 +255,7 @@ add_head:
>
> /* Descriptors and available array need to be set before we expose the
> * new available array entries. */
> - virtio_wmb(vq);
> + virtio_wmb(vq->weak_barriers);
> vq->vring.avail->idx++;
> vq->num_added++;
>
> @@ -312,7 +291,7 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> START_USE(vq);
> /* We need to expose available array entries before checking avail
> * event. */
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
>
> old = vq->vring.avail->idx - vq->num_added;
> new = vq->vring.avail->idx;
> @@ -436,7 +415,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> }
>
> /* Only get used array entries after they have been exposed by host. */
> - virtio_rmb(vq);
> + virtio_rmb(vq->weak_barriers);
>
> last_used = (vq->last_used_idx & (vq->vring.num - 1));
> i = vq->vring.used->ring[last_used].id;
> @@ -460,7 +439,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> * the read in the next get_buf call. */
> if (!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> vring_used_event(&vq->vring) = vq->last_used_idx;
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
> }
>
> #ifdef DEBUG
> @@ -513,7 +492,7 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
> * entry. Always do both to keep code simple. */
> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> vring_used_event(&vq->vring) = vq->last_used_idx;
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
> if (unlikely(more_used(vq))) {
> END_USE(vq);
> return false;
> @@ -553,7 +532,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> /* TODO: tune this threshold */
> bufs = (u16)(vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
> vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
> if (unlikely((u16)(vq->vring.used->idx - vq->last_used_idx) > bufs)) {
> END_USE(vq);
> return false;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index 63c6ea1..ca3ad41 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -4,6 +4,63 @@
> #include <linux/irqreturn.h>
> #include <uapi/linux/virtio_ring.h>
>
> +/*
> + * Barriers in virtio are tricky. Non-SMP virtio guests can't assume
> + * they're not on an SMP host system, so they need to assume real
> + * barriers. Non-SMP virtio hosts could skip the barriers, but does
> + * anyone care?
> + *
> + * For virtio_pci on SMP, we don't need to order with respect to MMIO
> + * accesses through relaxed memory I/O windows, so smp_mb() et al are
> + * sufficient.
> + *
> + * For using virtio to talk to real devices (eg. other heterogeneous
> + * CPUs) we do need real barriers. In theory, we could be using both
> + * kinds of virtio, so it's a runtime decision, and the branch is
> + * actually quite cheap.
> + */
> +
> +#ifdef CONFIG_SMP
> +static inline void virtio_mb(bool weak_barriers)
> +{
> + if (weak_barriers)
> + smp_mb();
> + else
> + mb();
> +}
> +
> +static inline void virtio_rmb(bool weak_barriers)
> +{
> + if (weak_barriers)
> + smp_rmb();
> + else
> + rmb();
> +}
> +
> +static inline void virtio_wmb(bool weak_barriers)
> +{
> + if (weak_barriers)
> + smp_wmb();
> + else
> + wmb();
> +}
> +#else
> +static inline void virtio_mb(bool weak_barriers)
> +{
> + mb();
> +}
> +
> +static inline void virtio_rmb(bool weak_barriers)
> +{
> + rmb();
> +}
> +
> +static inline void virtio_wmb(bool weak_barriers)
> +{
> + wmb();
> +}
> +#endif
> +
> struct virtio_device;
> struct virtqueue;
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> new file mode 100644
> index 0000000..508b5e5
> --- /dev/null
> +++ b/include/linux/vringh.h
> @@ -0,0 +1,115 @@
> +/*
> + * Linux host-side vring helpers; for when the kernel needs to access
> + * someone else's vring.
> + *
> + * Copyright IBM Corporation, 2013.
> + * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Written by: Rusty Russell <rusty@rustcorp.com.au>
> + */
> +#ifndef _LINUX_VRINGH_H
> +#define _LINUX_VRINGH_H
> +#include <uapi/linux/virtio_ring.h>
> +#include <uapi/linux/uio.h>
> +#include <asm/barrier.h>
> +
> +/* virtio_ring with information needed for host access. */
> +struct vringh {
> + /* Guest publishes used event idx (note: we always do). */
> + bool event_indices;
> +
> + /* Have we told the other end we want to be notified? */
> + bool listening;
> +
> + /* Can we get away with weak barriers? */
> + bool weak_barriers;
> +
> + /* Last available index we saw (ie. where we're up to). */
> + u16 last_avail_idx;
> +
> + /* Last index we used. */
> + u16 last_used_idx;
> +
> + /* The vring (note: it may contain user pointers!) */
> + struct vring vring;
> +};
> +
> +/* The memory the vring can access, and what offset to apply. */
> +struct vringh_range {
> + u64 start, end_incl;
> + u64 offset;
> +};
> +
> +/* All the information about an iovec. */
> +struct vringh_iov {
> + struct iovec *iov;
> + unsigned i, max;
> + bool allocated;
> +};
> +
> +/* Helpers for userspace vrings. */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used);
> +
> +/* Convert a descriptor into iovecs. */
> +int vringh_getdesc_user(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + u16 *head,
> + gfp_t gfp);
> +
> +/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
> +
> +/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len);
> +
> +/* Mark a descriptor as used. Sets notify if you should fire eventfd. */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> + bool *notify);
> +
> +/* Pretend we've never seen descriptor (for easy error handling). */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num);
> +
> +/* Helpers for kernelspace vrings. */
> +int vringh_init_kern(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc *desc,
> + struct vring_avail *avail,
> + struct vring_used *used);
> +
> +int vringh_getdesc_kern(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + u16 *head,
> + gfp_t gfp);
> +
> +ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len);
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len);
> +void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
> +int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len, bool *notify);
> +
> +bool vringh_notify_enable_kern(struct vringh *vrh);
> +void vringh_notify_disable_kern(struct vringh *vrh);
> +
> +#endif /* _LINUX_VRINGH_H */
> --
> 1.7.10.4
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 11:23 ` [PATCH 1/6] virtio_host: host-side implementation of virtio rings Michael S. Tsirkin
@ 2013-01-17 11:49 ` Sjur Brændeland
2013-01-17 12:08 ` Michael S. Tsirkin
2013-01-21 2:36 ` Rusty Russell
2013-01-21 2:34 ` Rusty Russell
1 sibling, 2 replies; 30+ messages in thread
From: Sjur Brændeland @ 2013-01-17 11:49 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
On Thu, Jan 17, 2013 at 12:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jan 17, 2013 at 08:59:38PM +1030, Rusty Russell wrote:
...
>> +static inline int
>> +__vringh_iov(struct vringh *vrh, u16 i,
>> + struct vringh_iov *riov,
>> + struct vringh_iov *wiov,
>> + bool (*getrange)(u64 addr, struct vringh_range *r),
>> + gfp_t gfp,
>> + int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
>> +{
>> + int err, count = 0, up_next, desc_max;
>> + struct vring_desc desc, *descs;
>> + struct vringh_range range = { -1ULL, 0 };
>> +
>> + /* We start traversing vring's descriptor table. */
>> + descs = vrh->vring.desc;
>> + desc_max = vrh->vring.num;
>> + up_next = -1;
>> +
>> + riov->i = wiov->i = 0;
>> + for (;;) {
>> + void *addr;
>> + struct vringh_iov *iov;
>> +
>> + err = getdesc(&desc, &descs[i]);
>> + if (unlikely(err))
>> + goto fail;
>> +
>> + /* Make sure it's OK, and get offset. */
>> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
>> + err = -EINVAL;
>> + goto fail;
>> + }
>> + addr = (void *)(long)desc.addr + range.offset;
>
> Should probably be (void *)(long)(desc.addr + range.offset).
> Otherwise we risk signed integer overflow.
>
>> +
>> + if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
>> + err = move_to_indirect(&up_next, &i, addr, &desc,
>> + &descs, &desc_max);
>> + if (err)
>> + goto fail;
>> + continue;
>> + }
>> +
>> + if (count++ == vrh->vring.num) {
>> + vringh_bad("Descriptor loop in %p", descs);
>> + err = -ELOOP;
>> + goto fail;
>> + }
>> +
>> + if (desc.flags & VRING_DESC_F_WRITE)
>> + iov = wiov;
>> + else {
>> + iov = riov;
>> + if (unlikely(wiov->i)) {
>> + vringh_bad("Readable desc %p after writable",
>> + &descs[i]);
>> + err = -EINVAL;
>> + goto fail;
>> + }
>> + }
>> +
>> + if (unlikely(iov->i == iov->max)) {
>> + err = resize_iovec(iov, gfp);
>> + if (err)
>> + goto fail;
>> + }
>> +
>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> + iov->iov[iov->i].iov_len = desc.len;
>
> The following comment from the previous version still applies:
> > This looks like it won't do the right thing if desc.len spans multiple
> > ranges. I don't know if this happens in practice but this is something
> > vhost supports ATM.
> in otgher words, we might need to split a single desc to multiple
> iov entries.
Splitting descriptors doesn't work for me. As described earlier, I
receive one IP-frame for each descriptor. So I need to keep
the original boundaries.
Regards,
Sjur
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 11:49 ` Sjur Brændeland
@ 2013-01-17 12:08 ` Michael S. Tsirkin
2013-01-21 2:36 ` Rusty Russell
1 sibling, 0 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2013-01-17 12:08 UTC (permalink / raw)
To: Sjur Brændeland; +Cc: virtualization
On Thu, Jan 17, 2013 at 12:49:25PM +0100, Sjur Brændeland wrote:
> On Thu, Jan 17, 2013 at 12:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Jan 17, 2013 at 08:59:38PM +1030, Rusty Russell wrote:
> ...
> >> +static inline int
> >> +__vringh_iov(struct vringh *vrh, u16 i,
> >> + struct vringh_iov *riov,
> >> + struct vringh_iov *wiov,
> >> + bool (*getrange)(u64 addr, struct vringh_range *r),
> >> + gfp_t gfp,
> >> + int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> >> +{
> >> + int err, count = 0, up_next, desc_max;
> >> + struct vring_desc desc, *descs;
> >> + struct vringh_range range = { -1ULL, 0 };
> >> +
> >> + /* We start traversing vring's descriptor table. */
> >> + descs = vrh->vring.desc;
> >> + desc_max = vrh->vring.num;
> >> + up_next = -1;
> >> +
> >> + riov->i = wiov->i = 0;
> >> + for (;;) {
> >> + void *addr;
> >> + struct vringh_iov *iov;
> >> +
> >> + err = getdesc(&desc, &descs[i]);
> >> + if (unlikely(err))
> >> + goto fail;
> >> +
> >> + /* Make sure it's OK, and get offset. */
> >> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
> >> + err = -EINVAL;
> >> + goto fail;
> >> + }
> >> + addr = (void *)(long)desc.addr + range.offset;
> >
> > Should probably be (void *)(long)(desc.addr + range.offset).
> > Otherwise we risk signed integer overflow.
> >
> >> +
> >> + if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
> >> + err = move_to_indirect(&up_next, &i, addr, &desc,
> >> + &descs, &desc_max);
> >> + if (err)
> >> + goto fail;
> >> + continue;
> >> + }
> >> +
> >> + if (count++ == vrh->vring.num) {
> >> + vringh_bad("Descriptor loop in %p", descs);
> >> + err = -ELOOP;
> >> + goto fail;
> >> + }
> >> +
> >> + if (desc.flags & VRING_DESC_F_WRITE)
> >> + iov = wiov;
> >> + else {
> >> + iov = riov;
> >> + if (unlikely(wiov->i)) {
> >> + vringh_bad("Readable desc %p after writable",
> >> + &descs[i]);
> >> + err = -EINVAL;
> >> + goto fail;
> >> + }
> >> + }
> >> +
> >> + if (unlikely(iov->i == iov->max)) {
> >> + err = resize_iovec(iov, gfp);
> >> + if (err)
> >> + goto fail;
> >> + }
> >> +
> >> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >> + iov->iov[iov->i].iov_len = desc.len;
> >
> > The following comment from the previous version still applies:
> > > This looks like it won't do the right thing if desc.len spans multiple
> > > ranges. I don't know if this happens in practice but this is something
> > > vhost supports ATM.
> > in otgher words, we might need to split a single desc to multiple
> > iov entries.
>
> Splitting descriptors doesn't work for me. As described earlier, I
> receive one IP-frame for each descriptor. So I need to keep
> the original boundaries.
>
> Regards,
> Sjur
Hmm this kind of conflicts with the description we were
going into with virtio generally.
Other side should be free to lay out request any way it
wants to.
Anyway, in this case this is splitting iovecs really not descriptors.
--
MST
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 11:49 ` Sjur Brændeland
2013-01-17 12:08 ` Michael S. Tsirkin
@ 2013-01-21 2:36 ` Rusty Russell
2013-01-22 14:54 ` Sjur Brændeland
1 sibling, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2013-01-21 2:36 UTC (permalink / raw)
To: Sjur Brændeland, Michael S. Tsirkin; +Cc: virtualization
Sjur Brændeland <sjur.brandeland@stericsson.com> writes:
> On Thu, Jan 17, 2013 at 12:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> in otgher words, we might need to split a single desc to multiple
>> iov entries.
>
> Splitting descriptors doesn't work for me. As described earlier, I
> receive one IP-frame for each descriptor. So I need to keep
> the original boundaries.
Yes, but the _kern helpers don't do range checking or offsetting at the
moment, so they'll be preserved.
If you want to do range checking, I can add that...
Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 2:36 ` Rusty Russell
@ 2013-01-22 14:54 ` Sjur Brændeland
0 siblings, 0 replies; 30+ messages in thread
From: Sjur Brændeland @ 2013-01-22 14:54 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization, Michael S. Tsirkin
On Mon, Jan 21, 2013 at 3:36 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Sjur Brændeland <sjur.brandeland@stericsson.com> writes:
>> On Thu, Jan 17, 2013 at 12:23 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> in otgher words, we might need to split a single desc to multiple
>>> iov entries.
>>
>> Splitting descriptors doesn't work for me. As described earlier, I
>> receive one IP-frame for each descriptor. So I need to keep
>> the original boundaries.
>
> Yes, but the _kern helpers don't do range checking or offsetting at the
> moment, so they'll be preserved.
>
> If you want to do range checking, I can add that...
I don't think I need that. Initially I can just check if the received address
is readable.
Regards,
Sjur
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 11:23 ` [PATCH 1/6] virtio_host: host-side implementation of virtio rings Michael S. Tsirkin
2013-01-17 11:49 ` Sjur Brændeland
@ 2013-01-21 2:34 ` Rusty Russell
2013-01-21 9:41 ` Michael S. Tsirkin
2013-01-21 11:52 ` Rusty Russell
1 sibling, 2 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-21 2:34 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Jan 17, 2013 at 08:59:38PM +1030, Rusty Russell wrote:
>> +/* Returns vring->num if empty, -ve on error. */
>> +static inline int __vringh_get_head(const struct vringh *vrh,
>> + int (*getu16)(u16 *val, const u16 *p),
>
> I think int (*getu16)(const u16 *p) would be cleaner
> than returning through a pointer, then
> callers check that value < 0 for error.
I disagree: I dislike overloading the error code, and I like the
symmetry with other operations (getdesc, putu16).
>> + /* Make sure it's OK, and get offset. */
>> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
>> + err = -EINVAL;
>> + goto fail;
>> + }
>> + addr = (void *)(long)desc.addr + range.offset;
>
> Should probably be (void *)(long)(desc.addr + range.offset).
> Otherwise we risk signed integer overflow.
Well, it's a noop. Either a pointer and long are 64 bit (no overflow),
or they're not (we truncate anyway when we assign to addr).
>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> + iov->iov[iov->i].iov_len = desc.len;
>
> The following comment from the previous version still applies:
> > This looks like it won't do the right thing if desc.len spans multiple
> > ranges. I don't know if this happens in practice but this is something
> > vhost supports ATM.
> in otgher words, we might need to split a single desc to multiple
> iov entries.
Ah, separate offsets for consecutive ranges, right. I'd prefer to say
"don't do that", but qemu is rarely sane. I'll fix it.
>> + err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
>> + if (err) {
>> + vringh_bad("Failed to update used index at %p",
>> + &vrh->vring.used->idx);
>> + return err;
>
>
> One thing vhost does is roll back everything on error,
> so you can for example have an invalid range
> of memory and handle writes there in userspace.
> I think it's worth preserving though this is
> currently unused.
Indeed, that's a nice feature. So is distinguishing a single bad
descriptor (which can be dropped, for vhost net) from a corrupt ring
(which means the device is useless).
>> + /* They could have slipped one in as we were doing that: make
>> + * sure it's written, then check again. */
>> + virtio_mb(vrh->weak_barriers);
>> +
>> + if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
>
> Hmm above has implicit != 0 why not here?
I didn't see the one above, but it's a clear nod that it doesn't return
a bool (yeah, it's nasty that we don't return the error in this case,
but in practice it's a tiny corner).
>> +static inline int getdesc_user(struct vring_desc *dst,
>> + const struct vring_desc *src)
>> +{
>> + return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
>> + -EFAULT;
>
> confused about __force above. Shouldn't it cast to __user?
> I have not tried does this patch pass the checker?
You're right, I haven't run sparse across it yet...
>> + vrh->vring.desc = (__force struct vring_desc *)desc;
>> + vrh->vring.avail = (__force struct vring_avail *)avail;
>> + vrh->vring.used = (__force struct vring_used *)used;
>
> I counted 3 separate chunks that do __force casts.
> Let's try to isolate them and comment why it's safe.
Yes, I want to look at using a union of kvec and iovec internally, but
I worry about breaking gcc's aliasing detection (the kernel compiles
with -fno-strict-aliasing but I hate relying on this).
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 2:34 ` Rusty Russell
@ 2013-01-21 9:41 ` Michael S. Tsirkin
2013-01-21 11:52 ` Rusty Russell
1 sibling, 0 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2013-01-21 9:41 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization
On Mon, Jan 21, 2013 at 01:04:47PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Thu, Jan 17, 2013 at 08:59:38PM +1030, Rusty Russell wrote:
> >> +/* Returns vring->num if empty, -ve on error. */
> >> +static inline int __vringh_get_head(const struct vringh *vrh,
> >> + int (*getu16)(u16 *val, const u16 *p),
> >
> > I think int (*getu16)(const u16 *p) would be cleaner
> > than returning through a pointer, then
> > callers check that value < 0 for error.
>
> I disagree: I dislike overloading the error code, and I like the
> symmetry with other operations (getdesc, putu16).
>
> >> + /* Make sure it's OK, and get offset. */
> >> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
> >> + err = -EINVAL;
> >> + goto fail;
> >> + }
> >> + addr = (void *)(long)desc.addr + range.offset;
> >
> > Should probably be (void *)(long)(desc.addr + range.offset).
> > Otherwise we risk signed integer overflow.
>
> Well, it's a noop. Either a pointer and long are 64 bit (no overflow),
> or they're not (we truncate anyway when we assign to addr).
For readability purposes, I think it's better to do all
math in 64 bit then cast as appropriate.
This also relies on -fno-strict-overflow - below you say it's best
not to rely on specific compiler flags, and I agree.
> >> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >> + iov->iov[iov->i].iov_len = desc.len;
> >
> > The following comment from the previous version still applies:
> > > This looks like it won't do the right thing if desc.len spans multiple
> > > ranges. I don't know if this happens in practice but this is something
> > > vhost supports ATM.
> > in otgher words, we might need to split a single desc to multiple
> > iov entries.
>
> Ah, separate offsets for consecutive ranges, right. I'd prefer to say
> "don't do that", but qemu is rarely sane. I'll fix it.
>
> >> + err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
> >> + if (err) {
> >> + vringh_bad("Failed to update used index at %p",
> >> + &vrh->vring.used->idx);
> >> + return err;
> >
> >
> > One thing vhost does is roll back everything on error,
> > so you can for example have an invalid range
> > of memory and handle writes there in userspace.
> > I think it's worth preserving though this is
> > currently unused.
>
> Indeed, that's a nice feature. So is distinguishing a single bad
> descriptor (which can be dropped, for vhost net) from a corrupt ring
> (which means the device is useless).
>
> >> + /* They could have slipped one in as we were doing that: make
> >> + * sure it's written, then check again. */
> >> + virtio_mb(vrh->weak_barriers);
> >> +
> >> + if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
> >
> > Hmm above has implicit != 0 why not here?
>
> I didn't see the one above, but it's a clear nod that it doesn't return
> a bool (yeah, it's nasty that we don't return the error in this case,
> but in practice it's a tiny corner).
>
> >> +static inline int getdesc_user(struct vring_desc *dst,
> >> + const struct vring_desc *src)
> >> +{
> >> + return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
> >> + -EFAULT;
> >
> > confused about __force above. Shouldn't it cast to __user?
> > I have not tried does this patch pass the checker?
>
> You're right, I haven't run sparse across it yet...
>
> >> + vrh->vring.desc = (__force struct vring_desc *)desc;
> >> + vrh->vring.avail = (__force struct vring_avail *)avail;
> >> + vrh->vring.used = (__force struct vring_used *)used;
> >
> > I counted 3 separate chunks that do __force casts.
> > Let's try to isolate them and comment why it's safe.
>
> Yes, I want to look at using a union of kvec and iovec internally, but
> I worry about breaking gcc's aliasing detection (the kernel compiles
> with -fno-strict-aliasing but I hate relying on this).
>
> Thanks,
> Rusty.
Right. I think tagging it __user to mean "can be userspace"
and then removing tag where we know it's safe is a decent compromize.
--
MST
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 2:34 ` Rusty Russell
2013-01-21 9:41 ` Michael S. Tsirkin
@ 2013-01-21 11:52 ` Rusty Russell
2013-01-21 12:24 ` Michael S. Tsirkin
1 sibling, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2013-01-21 11:52 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
Rusty Russell <rusty@rustcorp.com.au> writes:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>>> + iov->iov[iov->i].iov_len = desc.len;
>>
>> The following comment from the previous version still applies:
>> > This looks like it won't do the right thing if desc.len spans multiple
>> > ranges. I don't know if this happens in practice but this is something
>> > vhost supports ATM.
>> in otgher words, we might need to split a single desc to multiple
>> iov entries.
>
> Ah, separate offsets for consecutive ranges, right. I'd prefer to say
> "don't do that", but qemu is rarely sane. I'll fix it.
Actually, you make the same assumption for vhost, with your use of
getuser and putuser for accessing the ring.
The complexity and cost of handling this is significant, and the error
if it ever did happen is distinctive. Does qemu ever create such
things?
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 11:52 ` Rusty Russell
@ 2013-01-21 12:24 ` Michael S. Tsirkin
2013-01-21 12:40 ` Michael S. Tsirkin
2013-01-21 22:57 ` Rusty Russell
0 siblings, 2 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2013-01-21 12:24 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization
On Mon, Jan 21, 2013 at 10:22:03PM +1030, Rusty Russell wrote:
> Rusty Russell <rusty@rustcorp.com.au> writes:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> >>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >>> + iov->iov[iov->i].iov_len = desc.len;
> >>
> >> The following comment from the previous version still applies:
> >> > This looks like it won't do the right thing if desc.len spans multiple
> >> > ranges. I don't know if this happens in practice but this is something
> >> > vhost supports ATM.
> >> in otgher words, we might need to split a single desc to multiple
> >> iov entries.
> >
> > Ah, separate offsets for consecutive ranges, right. I'd prefer to say
> > "don't do that", but qemu is rarely sane. I'll fix it.
>
> Actually, you make the same assumption for vhost, with your use of
> getuser and putuser for accessing the ring.
There's no requirement that ring is mapped directly into guest
memory. If a ring is not contigious qemu can allocate
it's own virtuall contigious rings and copy data back and forth.
> The complexity and cost of handling this is significant,
Why is it? Just add a while loop, increment desc.addr
and decrement desc.len.
> and the error
> if it ever did happen is distinctive. Does qemu ever create such
> things?
>
> Thanks,
> Rusty.
I think qemu does create ranges that are consequitive in guest pa
space but not in qemu va ranges. I'm sick today, will try to dig out
some examples later.
--
MST
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 12:24 ` Michael S. Tsirkin
@ 2013-01-21 12:40 ` Michael S. Tsirkin
2013-01-21 22:57 ` Rusty Russell
1 sibling, 0 replies; 30+ messages in thread
From: Michael S. Tsirkin @ 2013-01-21 12:40 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization
On Mon, Jan 21, 2013 at 02:24:31PM +0200, Michael S. Tsirkin wrote:
> On Mon, Jan 21, 2013 at 10:22:03PM +1030, Rusty Russell wrote:
> > Rusty Russell <rusty@rustcorp.com.au> writes:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > >>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> > >>> + iov->iov[iov->i].iov_len = desc.len;
> > >>
> > >> The following comment from the previous version still applies:
> > >> > This looks like it won't do the right thing if desc.len spans multiple
> > >> > ranges. I don't know if this happens in practice but this is something
> > >> > vhost supports ATM.
> > >> in otgher words, we might need to split a single desc to multiple
> > >> iov entries.
> > >
> > > Ah, separate offsets for consecutive ranges, right. I'd prefer to say
> > > "don't do that", but qemu is rarely sane. I'll fix it.
> >
> > Actually, you make the same assumption for vhost, with your use of
> > getuser and putuser for accessing the ring.
>
> There's no requirement that ring is mapped directly into guest
> memory. If a ring is not contigious qemu can allocate
> it's own virtuall contigious rings and copy data back and forth.
>
> > The complexity and cost of handling this is significant,
>
> Why is it? Just add a while loop, increment desc.addr
> and decrement desc.len.
>
> > and the error
> > if it ever did happen is distinctive. Does qemu ever create such
> > things?
> >
> > Thanks,
> > Rusty.
>
> I think qemu does create ranges that are consequitive in guest pa
> space but not in qemu va ranges. I'm sick today, will try to dig out
> some examples later.
It seems it could happen if e.g. one region is ROM and another one is
RAM. This means it won't be used for the ring (which is R/W) but could
be for the data.
> --
> MST
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 12:24 ` Michael S. Tsirkin
2013-01-21 12:40 ` Michael S. Tsirkin
@ 2013-01-21 22:57 ` Rusty Russell
2013-01-22 6:57 ` Rusty Russell
2013-01-22 7:13 ` Rusty Russell
1 sibling, 2 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-21 22:57 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Mon, Jan 21, 2013 at 10:22:03PM +1030, Rusty Russell wrote:
>> Rusty Russell <rusty@rustcorp.com.au> writes:
>> > "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> >>> + iov->iov[iov->i].iov_len = desc.len;
>> >>
>> >> The following comment from the previous version still applies:
>> >> > This looks like it won't do the right thing if desc.len spans multiple
>> >> > ranges. I don't know if this happens in practice but this is something
>> >> > vhost supports ATM.
>> >> in otgher words, we might need to split a single desc to multiple
>> >> iov entries.
>> >
>> > Ah, separate offsets for consecutive ranges, right. I'd prefer to say
>> > "don't do that", but qemu is rarely sane. I'll fix it.
>>
>> Actually, you make the same assumption for vhost, with your use of
>> getuser and putuser for accessing the ring.
>
> There's no requirement that ring is mapped directly into guest
> memory. If a ring is not contigious qemu can allocate
> it's own virtuall contigious rings and copy data back and forth.
True, but it's the guest which allocates the ring. If QEMU sets up a
guest with a offset-discontiguous mapping, vhost would be unreliable
today.
>> The complexity and cost of handling this is significant,
>
> Why is it? Just add a while loop, increment desc.addr
> and decrement desc.len.
Indirect handling now needs to re-check the boundaries for every
descriptor it fetches, and handle the case where a single descriptor
overlaps the boundary.
The ability to verify the indirect descriptor table all at once makes
the code fast and neat.
From your other mail:
> It seems it could happen if e.g. one region is ROM and another one is
> RAM. This means it won't be used for the ring (which is R/W) but could
> be for the data.
That case wouldn't be so bad, I think, since it's a simple loop.
I'll create some patches and see if it's too ugly to live...
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 22:57 ` Rusty Russell
@ 2013-01-22 6:57 ` Rusty Russell
2013-01-22 7:13 ` Rusty Russell
1 sibling, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-22 6:57 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
Rusty Russell <rusty@rustcorp.com.au> writes:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> On Mon, Jan 21, 2013 at 10:22:03PM +1030, Rusty Russell wrote:
>>> Rusty Russell <rusty@rustcorp.com.au> writes:
>>> > "Michael S. Tsirkin" <mst@redhat.com> writes:
>>> >>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>>> >>> + iov->iov[iov->i].iov_len = desc.len;
>>> >>
>>> >> The following comment from the previous version still applies:
>>> >> > This looks like it won't do the right thing if desc.len spans multiple
>>> >> > ranges. I don't know if this happens in practice but this is something
>>> >> > vhost supports ATM.
>>> >> in otgher words, we might need to split a single desc to multiple
>>> >> iov entries.
>>> >
>>> > Ah, separate offsets for consecutive ranges, right. I'd prefer to say
>>> > "don't do that", but qemu is rarely sane. I'll fix it.
>>>
>>> Actually, you make the same assumption for vhost, with your use of
>>> getuser and putuser for accessing the ring.
>>
>> There's no requirement that ring is mapped directly into guest
>> memory. If a ring is not contigious qemu can allocate
>> it's own virtuall contigious rings and copy data back and forth.
>
> True, but it's the guest which allocates the ring. If QEMU sets up a
> guest with a offset-discontiguous mapping, vhost would be unreliable
> today.
My mistake: the ring addresses handed through the ioctl already
translated, so you can't specify such a thing.
I struck this when I tried to clean it up: this is an asymmetry between
the toplevel descriptor table and any indirect ones.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-21 22:57 ` Rusty Russell
2013-01-22 6:57 ` Rusty Russell
@ 2013-01-22 7:13 ` Rusty Russell
1 sibling, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-22 7:13 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization
Rusty Russell <rusty@rustcorp.com.au> writes:
> I'll create some patches and see if it's too ugly to live...
Hmm, with rough userspace testing I managed to get the speed penalty
pretty low.
Here are the two patches, inline:
vringh: handle case where data goes across multiple ranges.
QEMU seems to only use separate memory ranges for ROM vs RAM, but in theory
a single scatterlist element referred to by a vring could cross two ranges,
and thus need to be split into two separate iovec entries.
This causes no measurable slowdown:
Before: (average of 20 runs)
./vringh_test --indirect --eventidx --parallel
real 0m3.236s
After: (average of 20 runs)
./vringh_test --indirect --eventidx --parallel
real 0m3.223s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 2ba087d..50d37a7 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -91,33 +91,37 @@ static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
return done;
}
-static inline bool check_range(u64 addr, u32 len,
+/* May reduce *len if range is shorter. */
+static inline bool check_range(u64 addr, u32 *len,
struct vringh_range *range,
bool (*getrange)(u64, struct vringh_range *))
{
if (addr < range->start || addr > range->end_incl) {
if (!getrange(addr, range))
- goto bad;
+ return false;
}
BUG_ON(addr < range->start || addr > range->end_incl);
/* To end of memory? */
- if (unlikely(addr + len == 0)) {
+ if (unlikely(addr + *len == 0)) {
if (range->end_incl == -1ULL)
return true;
- goto bad;
+ goto truncate;
}
/* Otherwise, don't wrap. */
- if (unlikely(addr + len < addr))
- goto bad;
- if (unlikely(addr + len - 1 > range->end_incl))
- goto bad;
+ if (addr + *len < addr) {
+ vringh_bad("Wrapping descriptor %u@0x%llx", *len, addr);
+ return false;
+ }
+
+ if (unlikely(addr + *len - 1 > range->end_incl))
+ goto truncate;
return true;
-bad:
- vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
- return false;
+truncate:
+ *len = range->end_incl + 1 - addr;
+ return true;
}
/* No reason for this code to be inline. */
@@ -205,19 +209,30 @@ __vringh_iov(struct vringh *vrh, u16 i,
for (;;) {
void *addr;
struct vringh_kiov *iov;
+ u32 len;
err = getdesc(&desc, &descs[i]);
if (unlikely(err))
goto fail;
- /* Make sure it's OK, and get offset. */
- if (!check_range(desc.addr, desc.len, &range, getrange)) {
- err = -EINVAL;
- goto fail;
- }
- addr = (void *)(long)desc.addr + range.offset;
-
if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+ /* Make sure it's OK, and get offset. */
+ len = desc.len;
+ if (!check_range(desc.addr, &len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+
+ if (len != desc.len) {
+ vringh_bad("Indirect descriptor table across"
+ " ranges: %u@%#llx vs %#llx-%#llx",
+ desc.len, desc.addr, range.start,
+ range.end_incl);
+ err = -EINVAL;
+ goto fail;
+ }
+ addr = (void *)(long)(desc.addr + range.offset);
+
err = move_to_indirect(&up_next, &i, addr, &desc,
&descs, &desc_max);
if (err)
@@ -243,6 +258,15 @@ __vringh_iov(struct vringh *vrh, u16 i,
}
}
+ again:
+ /* Make sure it's OK, and get offset. */
+ len = desc.len;
+ if (!check_range(desc.addr, &len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+ addr = (void *)(unsigned long)(desc.addr + range.offset);
+
if (unlikely(iov->i == iov->max)) {
err = resize_iovec(iov, gfp);
if (err)
@@ -250,9 +274,15 @@ __vringh_iov(struct vringh *vrh, u16 i,
}
iov->iov[iov->i].iov_base = addr;
- iov->iov[iov->i].iov_len = desc.len;
+ iov->iov[iov->i].iov_len = len;
iov->i++;
+ if (unlikely(len != desc.len)) {
+ desc.len -= len;
+ desc.addr += len;
+ goto again;
+ }
+
if (desc.flags & VRING_DESC_F_NEXT) {
i = desc.next;
} else {
vringh: handle case where indirect descriptors go across multiple ranges.
If a data segment can go across multiple ranges, so can an indirect
descriptor table. This is nastier, so we explicitly slowpath this.
This slows things down a little though.
Before: (average of 20 runs)
./vringh_test --indirect --eventidx --parallel
real 0m3.217s
After: (average of 20 runs)
./vringh_test --indirect --eventidx --parallel
real 0m3.364s
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 50d37a7..5dbf4b1 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -188,17 +188,46 @@ static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
return i;
}
+static int slow_copy(void *dst, const void *src,
+ bool (*getrange)(u64 addr, struct vringh_range *r),
+ struct vringh_range *range,
+ int (*copy)(void *dst, const void *src, size_t len))
+{
+ size_t part, len = sizeof(struct vring_desc);
+
+ do {
+ u64 addr;
+ int err;
+
+ part = len;
+ addr = (u64)(unsigned long)src - range->offset;
+
+ if (!check_range(addr, &part, range, getrange))
+ return -EINVAL;
+
+ err = copy(dst, src, part);
+ if (err)
+ return err;
+
+ dst += part;
+ src += part;
+ len -= part;
+ } while (len);
+ return 0;
+}
+
static inline int
__vringh_iov(struct vringh *vrh, u16 i,
struct vringh_kiov *riov,
struct vringh_kiov *wiov,
bool (*getrange)(u64 addr, struct vringh_range *r),
gfp_t gfp,
- int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
+ int (*copy)(void *dst, const void *src, size_t len))
{
int err, count = 0, up_next, desc_max;
struct vring_desc desc, *descs;
- struct vringh_range range = { -1ULL, 0 };
+ struct vringh_range range = { -1ULL, 0 }, slowrange;
+ bool slow = false;
/* We start traversing vring's descriptor table. */
descs = vrh->vring.desc;
@@ -211,7 +240,11 @@ __vringh_iov(struct vringh *vrh, u16 i,
struct vringh_kiov *iov;
u32 len;
- err = getdesc(&desc, &descs[i]);
+ if (unlikely(slow))
+ err = slow_copy(&desc, &descs[i], getrange, &slowrange,
+ copy);
+ else
+ err = copy(&desc, &descs[i], sizeof(desc));
if (unlikely(err))
goto fail;
@@ -223,16 +256,13 @@ __vringh_iov(struct vringh *vrh, u16 i,
goto fail;
}
- if (len != desc.len) {
- vringh_bad("Indirect descriptor table across"
- " ranges: %u@%#llx vs %#llx-%#llx",
- desc.len, desc.addr, range.start,
- range.end_incl);
- err = -EINVAL;
- goto fail;
+ if (unlikely(len != desc.len)) {
+ slow = true;
+ /* We need to save this range to use offset */
+ slowrange = range;
}
- addr = (void *)(long)(desc.addr + range.offset);
+ addr = (void *)(long)(desc.addr + range.offset);
err = move_to_indirect(&up_next, &i, addr, &desc,
&descs, &desc_max);
if (err)
@@ -287,10 +317,11 @@ __vringh_iov(struct vringh *vrh, u16 i,
i = desc.next;
} else {
/* Just in case we need to finish traversing above. */
- if (unlikely(up_next > 0))
+ if (unlikely(up_next > 0)) {
i = return_from_indirect(vrh, &up_next,
&descs, &desc_max);
- else
+ slow = false;
+ } else
break;
}
@@ -479,10 +510,9 @@ static inline int putu16_user(u16 *p, u16 val)
return put_user(val, (__force u16 __user *)p);
}
-static inline int getdesc_user(struct vring_desc *dst,
- const struct vring_desc *src)
+static inline int copydesc_user(void *dst, const void *src, size_t len)
{
- return copy_from_user(dst, (__force void __user *)src, sizeof(*dst)) ?
+ return copy_from_user(dst, (__force void __user *)src, len) ?
-EFAULT : 0;
}
@@ -597,7 +627,7 @@ int vringh_getdesc_user(struct vringh *vrh,
*head = err;
err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
(struct vringh_kiov *)wiov,
- getrange, gfp, getdesc_user);
+ getrange, gfp, copydesc_user);
if (err)
return err;
@@ -711,10 +741,9 @@ static inline int putu16_kern(u16 *p, u16 val)
return 0;
}
-static inline int getdesc_kern(struct vring_desc *dst,
- const struct vring_desc *src)
+static inline int copydesc_kern(void *dst, const void *src, size_t len)
{
- *dst = *src;
+ memcpy(dst, src, len);
return 0;
}
@@ -806,7 +835,7 @@ int vringh_getdesc_kern(struct vringh *vrh,
*head = err;
err = __vringh_iov(vrh, *head, riov, wiov, noop_getrange,
- gfp, getdesc_kern);
+ gfp, copydesc_kern);
if (err)
return err;
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
index 01ccaed..bc74c41 100644
--- a/tools/virtio/vringh_test.c
+++ b/tools/virtio/vringh_test.c
@@ -30,14 +30,33 @@ static void never_callback_guest(struct virtqueue *vq)
abort();
}
-static inline bool getrange_iov(u64 addr, struct vringh_range *r)
+static bool getrange_iov(u64 addr, struct vringh_range *r)
{
+ if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+ return false;
+ if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+ return false;
+
r->start = (u64)(unsigned long)__user_addr_min - user_addr_offset;
r->end_incl = (u64)(unsigned long)__user_addr_max - 1 - user_addr_offset;
r->offset = user_addr_offset;
return true;
}
+/* We return single byte ranges. */
+static bool getrange_slow(u64 addr, struct vringh_range *r)
+{
+ if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+ return false;
+ if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+ return false;
+
+ r->start = addr;
+ r->end_incl = r->start;
+ r->offset = user_addr_offset;
+ return true;
+}
+
struct guest_virtio_device {
struct virtio_device vdev;
int to_host_fd;
@@ -75,7 +94,8 @@ static void find_cpus(unsigned int *first, unsigned int *last)
}
}
-static int parallel_test(unsigned long features)
+static int parallel_test(unsigned long features,
+ bool (*getrange)(u64 addr, struct vringh_range *r))
{
void *host_map, *guest_map;
int fd, mapsize, to_guest[2], to_host[2];
@@ -144,7 +164,7 @@ static int parallel_test(unsigned long features)
wiov.max = ARRAY_SIZE(host_wiov);
wiov.allocated = false;
- err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange,
&head, GFP_KERNEL);
if (err == 0) {
char buf[128];
@@ -349,6 +369,7 @@ int main(int argc, char *argv[])
int err;
unsigned i;
void *ret;
+ bool (*getrange)(u64 addr, struct vringh_range *r) = getrange_iov;
vdev.features[0] = 0;
@@ -362,8 +383,13 @@ int main(int argc, char *argv[])
argv++;
}
+ if (argv[1] && strcmp(argv[1], "--slow") == 0) {
+ getrange = getrange_slow;
+ argv++;
+ }
+
if (argv[1] && strcmp(argv[1], "--parallel") == 0)
- return parallel_test(vdev.features[0]);
+ return parallel_test(vdev.features[0], getrange);
if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
abort();
@@ -382,7 +408,7 @@ int main(int argc, char *argv[])
vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
/* No descriptor to get yet... */
- err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange,
&head, GFP_KERNEL);
if (err != 0)
errx(1, "vringh_getdesc_user: %i", err);
@@ -410,7 +436,7 @@ int main(int argc, char *argv[])
wiov.max = ARRAY_SIZE(host_wiov);
wiov.allocated = false;
- err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange,
&head, GFP_KERNEL);
if (err != 1)
errx(1, "vringh_getdesc_user: %i", err);
@@ -418,9 +444,17 @@ int main(int argc, char *argv[])
assert(riov.max == 1);
assert(riov.iov[0].iov_base == __user_addr_max - 1);
assert(riov.iov[0].iov_len == 1);
- assert(wiov.max == 1);
- assert(wiov.iov[0].iov_base == __user_addr_max - 3);
- assert(wiov.iov[0].iov_len == 2);
+ if (getrange != getrange_slow) {
+ assert(wiov.max == 1);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 2);
+ } else {
+ assert(wiov.max == 2);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 1);
+ assert(wiov.iov[1].iov_base == __user_addr_max - 2);
+ assert(wiov.iov[1].iov_len == 1);
+ }
err = vringh_iov_pull_user(&riov, buf, 5);
if (err != 1)
@@ -434,7 +468,7 @@ int main(int argc, char *argv[])
if (err != 2)
errx(1, "vringh_iov_push_user: %i", err);
assert(memcmp(__user_addr_max - 3, "bc", 2) == 0);
- assert(wiov.i == 1);
+ assert(wiov.i == wiov.max);
assert(vringh_iov_push_user(&wiov, buf, 5) == 0);
/* Host is done. */
@@ -477,14 +511,17 @@ int main(int argc, char *argv[])
wiov.max = ARRAY_SIZE(host_wiov);
wiov.allocated = false;
- err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange,
&head, GFP_KERNEL);
if (err != 1)
errx(1, "vringh_getdesc_user: %i", err);
assert(riov.allocated);
assert(riov.iov != host_riov);
- assert(riov.max == RINGSIZE);
+ if (getrange != getrange_slow)
+ assert(riov.max == RINGSIZE);
+ else
+ assert(riov.max == RINGSIZE * USER_MEM/4);
assert(!wiov.allocated);
assert(wiov.max == 0);
@@ -567,7 +604,7 @@ int main(int argc, char *argv[])
wiov.max = ARRAY_SIZE(host_wiov);
wiov.allocated = false;
- err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange_iov,
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange,
&head, GFP_KERNEL);
if (err != 1)
errx(1, "vringh_getdesc_user: %i", err);
@@ -575,7 +612,10 @@ int main(int argc, char *argv[])
if (head != n)
errx(1, "vringh_getdesc_user: head %i not %i", head, n);
- assert(riov.max == 7);
+ if (getrange != getrange_slow)
+ assert(riov.max == 7);
+ else
+ assert(riov.max == 28);
assert(riov.allocated);
err = vringh_iov_pull_user(&riov, buf, 29);
assert(err == 28);
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
` (5 preceding siblings ...)
2013-01-17 11:23 ` [PATCH 1/6] virtio_host: host-side implementation of virtio rings Michael S. Tsirkin
@ 2013-01-22 8:12 ` Asias He
2013-01-23 1:56 ` Rusty Russell
2013-02-04 20:29 ` Sjur Brændeland
7 siblings, 1 reply; 30+ messages in thread
From: Asias He @ 2013-01-22 8:12 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S . Tsirkin, virtualization
On 01/17/2013 06:29 PM, Rusty Russell wrote:
> Getting use of virtio rings correct is tricky, and a recent patch saw
> an implementation of in-kernel rings (as separate from userspace).
>
> This patch attempts to abstract the business of dealing with the
> virtio ring layout from the access (userspace or direct); to do this,
> we use function pointers, which gcc inlines correctly.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> ---
> drivers/Makefile | 2 +-
> drivers/vhost/Kconfig | 8 +
> drivers/vhost/Kconfig.tcm | 1 +
> drivers/vhost/Makefile | 2 +
> drivers/vhost/vringh.c | 818 ++++++++++++++++++++++++++++++++++++++++++
> drivers/virtio/virtio_ring.c | 33 +-
> include/linux/virtio_ring.h | 57 +++
Why vringh_notify_enable_user() and vringh_notify_disable_user() are not
declared in include/linux/virtio_ring.h? Missed that?
> include/linux/vringh.h | 115 ++++++
> 8 files changed, 1008 insertions(+), 28 deletions(-)
> create mode 100644 drivers/vhost/vringh.c
> create mode 100644 include/linux/vringh.h
>
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 7863b9f..351a34f 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -123,7 +123,7 @@ obj-$(CONFIG_PPC_PS3) += ps3/
> obj-$(CONFIG_OF) += of/
> obj-$(CONFIG_SSB) += ssb/
> obj-$(CONFIG_BCMA) += bcma/
> -obj-$(CONFIG_VHOST_NET) += vhost/
> +obj-$(CONFIG_VHOST_RING) += vhost/
> obj-$(CONFIG_VLYNQ) += vlynq/
> obj-$(CONFIG_STAGING) += staging/
> obj-y += platform/
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index 202bba6..613b074 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -1,6 +1,7 @@
> config VHOST_NET
> tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
> depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
> + select VHOST_RING
> ---help---
> This kernel module can be loaded in host kernel to accelerate
> guest networking with virtio_net. Not to be confused with virtio_net
> @@ -12,3 +13,10 @@ config VHOST_NET
> if STAGING
> source "drivers/vhost/Kconfig.tcm"
> endif
> +
> +config VHOST_RING
> + tristate
> + ---help---
> + This option is selected by any driver which needs to access
> + the host side of a virtio ring.
> +
> diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
> index a9c6f76..0218f77 100644
> --- a/drivers/vhost/Kconfig.tcm
> +++ b/drivers/vhost/Kconfig.tcm
> @@ -1,6 +1,7 @@
> config TCM_VHOST
> tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
> depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
> + select VHOST_RING
> default n
> ---help---
> Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
> diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
> index a27b053..1d37f5e 100644
> --- a/drivers/vhost/Makefile
> +++ b/drivers/vhost/Makefile
> @@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
> vhost_net-y := vhost.o net.o
>
> obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
> +
> +obj-$(CONFIG_VHOST_RING) += vringh.o
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> new file mode 100644
> index 0000000..b28670f
> --- /dev/null
> +++ b/drivers/vhost/vringh.c
> @@ -0,0 +1,818 @@
> +/*
> + * Helpers for the host side of a virtio ring.
> + *
> + * Since these may be in userspace, we use (inline) accessors.
> + */
> +#include <linux/vringh.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/kernel.h>
> +#include <linux/ratelimit.h>
> +#include <linux/uaccess.h>
> +#include <linux/slab.h>
> +
> +static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
> +{
> + static DEFINE_RATELIMIT_STATE(vringh_rs,
> + DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> + if (__ratelimit(&vringh_rs)) {
> + va_list ap;
> + va_start(ap, fmt);
> + printk(KERN_NOTICE "vringh:");
> + vprintk(fmt, ap);
> + va_end(ap);
> + }
> +}
> +
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
> + u16 *last_avail_idx)
> +{
> + u16 avail_idx, i, head;
> + int err;
> +
> + err = getu16(&avail_idx, &vrh->vring.avail->idx);
> + if (err) {
> + vringh_bad("Failed to access avail idx at %p",
> + &vrh->vring.avail->idx);
> + return err;
> + }
> +
> + if (*last_avail_idx == avail_idx)
> + return vrh->vring.num;
> +
> + /* Only get avail ring entries after they have been exposed by guest. */
> + virtio_rmb(vrh->weak_barriers);
> +
> + i = *last_avail_idx & (vrh->vring.num - 1);
> +
> + err = getu16(&head, &vrh->vring.avail->ring[i]);
> + if (err) {
> + vringh_bad("Failed to read head: idx %d address %p",
> + *last_avail_idx, &vrh->vring.avail->ring[i]);
> + return err;
> + }
> +
> + if (head >= vrh->vring.num) {
> + vringh_bad("Guest says index %u > %u is available",
> + head, vrh->vring.num);
> + return -EINVAL;
> + }
> +
> + (*last_avail_idx)++;
> + return head;
> +}
> +
> +/* Copy some bytes to/from the iovec. Returns num copied. */
> +static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
> + void *ptr, size_t len,
> + int (*xfer)(void __user *addr, void *ptr,
> + size_t len))
> +{
> + int err, done = 0;
> +
> + while (len && iov->i < iov->max) {
> + size_t partlen;
> +
> + partlen = min(iov->iov[iov->i].iov_len, len);
> + err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
> + if (err)
> + return err;
> + done += partlen;
> + len -= partlen;
> + ptr += partlen;
> + iov->iov[iov->i].iov_base += partlen;
> + iov->iov[iov->i].iov_len -= partlen;
> +
> + if (iov->iov[iov->i].iov_len == 0)
> + iov->i++;
> + }
> + return done;
> +}
> +
> +static inline bool check_range(u64 addr, u32 len,
> + struct vringh_range *range,
> + bool (*getrange)(u64, struct vringh_range *))
> +{
> + if (addr < range->start || addr > range->end_incl) {
> + if (!getrange(addr, range))
> + goto bad;
> + }
> + BUG_ON(addr < range->start || addr > range->end_incl);
> +
> + /* To end of memory? */
> + if (unlikely(addr + len == 0)) {
> + if (range->end_incl == -1ULL)
> + return true;
> + goto bad;
> + }
> +
> + /* Otherwise, don't wrap. */
> + if (unlikely(addr + len < addr))
> + goto bad;
> + if (unlikely(addr + len - 1 > range->end_incl))
> + goto bad;
> + return true;
> +
> +bad:
> + vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
> + return false;
> +}
> +
> +/* No reason for this code to be inline. */
> +static int move_to_indirect(int *up_next, u16 *i, void *addr,
> + const struct vring_desc *desc,
> + struct vring_desc **descs, int *desc_max)
> +{
> + /* Indirect tables can't have indirect. */
> + if (*up_next != -1) {
> + vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
> + return -EINVAL;
> + }
> +
> + if (unlikely(desc->len % sizeof(struct vring_desc))) {
> + vringh_bad("Strange indirect len %u", desc->len);
> + return -EINVAL;
> + }
> +
> + /* We will check this when we follow it! */
> + if (desc->flags & VRING_DESC_F_NEXT)
> + *up_next = desc->next;
> + else
> + *up_next = -2;
> + *descs = addr;
> + *desc_max = desc->len / sizeof(struct vring_desc);
> +
> + /* Now, start at the first indirect. */
> + *i = 0;
> + return 0;
> +}
> +
> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> +{
> + struct iovec *new;
> + unsigned int new_num = iov->max * 2;
> +
> + if (new_num < 8)
> + new_num = 8;
> +
> + if (iov->allocated)
> + new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
> + else {
> + new = kmalloc(new_num * sizeof(struct iovec), gfp);
> + if (new) {
> + memcpy(new, iov->iov, iov->i * sizeof(struct iovec));
> + iov->allocated = true;
> + }
> + }
> + if (!new)
> + return -ENOMEM;
> + iov->iov = new;
> + iov->max = new_num;
> + return 0;
> +}
> +
> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
> + struct vring_desc **descs, int *desc_max)
> +{
> + u16 i = *up_next;
> +
> + *up_next = -1;
> + *descs = vrh->vring.desc;
> + *desc_max = vrh->vring.num;
> + return i;
> +}
> +
> +static inline int
> +__vringh_iov(struct vringh *vrh, u16 i,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + gfp_t gfp,
> + int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> +{
> + int err, count = 0, up_next, desc_max;
> + struct vring_desc desc, *descs;
> + struct vringh_range range = { -1ULL, 0 };
> +
> + /* We start traversing vring's descriptor table. */
> + descs = vrh->vring.desc;
> + desc_max = vrh->vring.num;
> + up_next = -1;
> +
> + riov->i = wiov->i = 0;
> + for (;;) {
> + void *addr;
> + struct vringh_iov *iov;
> +
> + err = getdesc(&desc, &descs[i]);
> + if (unlikely(err))
> + goto fail;
> +
> + /* Make sure it's OK, and get offset. */
> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
> + err = -EINVAL;
> + goto fail;
> + }
> + addr = (void *)(long)desc.addr + range.offset;
> +
> + if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
> + err = move_to_indirect(&up_next, &i, addr, &desc,
> + &descs, &desc_max);
> + if (err)
> + goto fail;
> + continue;
> + }
> +
> + if (count++ == vrh->vring.num) {
> + vringh_bad("Descriptor loop in %p", descs);
> + err = -ELOOP;
> + goto fail;
> + }
> +
> + if (desc.flags & VRING_DESC_F_WRITE)
> + iov = wiov;
> + else {
> + iov = riov;
> + if (unlikely(wiov->i)) {
> + vringh_bad("Readable desc %p after writable",
> + &descs[i]);
> + err = -EINVAL;
> + goto fail;
> + }
> + }
> +
> + if (unlikely(iov->i == iov->max)) {
> + err = resize_iovec(iov, gfp);
> + if (err)
> + goto fail;
> + }
> +
> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> + iov->iov[iov->i].iov_len = desc.len;
> + iov->i++;
> +
> + if (desc.flags & VRING_DESC_F_NEXT) {
> + i = desc.next;
> + } else {
> + /* Just in case we need to finish traversing above. */
> + if (unlikely(up_next > 0))
> + i = return_from_indirect(vrh, &up_next,
> + &descs, &desc_max);
> + else
> + break;
> + }
> +
> + if (i >= desc_max) {
> + vringh_bad("Chained index %u > %u", i, desc_max);
> + err = -EINVAL;
> + goto fail;
> + }
> + }
> +
> + /* Reset for fresh iteration. */
> + riov->max = riov->i;
> + wiov->max = wiov->i;
> + riov->i = wiov->i = 0;
> + return 0;
> +
> +fail:
> + if (riov->allocated)
> + kfree(riov->iov);
> + if (wiov->allocated)
> + kfree(wiov->iov);
> + return err;
> +}
> +
> +static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
> + int (*getu16)(u16 *val, const u16 *p),
> + int (*putu16)(u16 *p, u16 val),
> + int (*putused)(struct vring_used_elem *dst,
> + const struct vring_used_elem
> + *s),
> + bool *notify)
> +{
> + struct vring_used_elem used;
> + struct vring_used *used_ring;
> + int err;
> + u16 used_idx, old, used_event;
> +
> + used.id = idx;
> + used.len = len;
> +
> + err = getu16(&used_idx, &vring_used_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to access used event %p",
> + &vring_used_event(&vrh->vring));
> + return err;
> + }
> +
> + used_ring = vrh->vring.used;
> + used_idx = vrh->last_used_idx;
> +
> + err = putused(&used_ring->ring[used_idx % vrh->vring.num],
> + &used);
> + if (err) {
> + vringh_bad("Failed to write used entry %u at %p",
> + used_idx % vrh->vring.num,
> + &used_ring->ring[used_idx % vrh->vring.num]);
> + return err;
> + }
> +
> + /* Make sure buffer is written before we update index. */
> + virtio_wmb(vrh->weak_barriers);
> +
> + old = vrh->last_used_idx;
> + vrh->last_used_idx++;
> +
> + err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
> + if (err) {
> + vringh_bad("Failed to update used index at %p",
> + &vrh->vring.used->idx);
> + return err;
> + }
> +
> + /* If we already know we need to notify, skip re-checking */
> + if (*notify)
> + return 0;
> +
> + /* Flush out used index update. This is paired with the
> + * barrier that the Guest executes when enabling
> + * interrupts. */
> + virtio_mb(vrh->weak_barriers);
> +
> + /* Old-style, without event indices. */
> + if (!vrh->event_indices) {
> + u16 flags;
> + err = getu16(&flags, &vrh->vring.avail->flags);
> + if (err) {
> + vringh_bad("Failed to get flags at %p",
> + &vrh->vring.avail->flags);
> + return err;
> + }
> + if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
> + *notify = true;
> + return 0;
> + }
> +
> + /* Modern: we know where other side is up to. */
> + err = getu16(&used_event, &vring_used_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to get used event idx at %p",
> + &vring_used_event(&vrh->vring));
> + return err;
> + }
> + if (vring_need_event(used_event, vrh->last_used_idx, old))
> + *notify = true;
> + return 0;
> +}
> +
> +static inline bool __vringh_notify_enable(struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
> + int (*putu16)(u16 *p, u16 val))
> +{
> + u16 avail;
> +
> + /* Already enabled? */
> + if (vrh->listening)
> + return false;
> +
> + vrh->listening = true;
> +
> + if (!vrh->event_indices) {
> + /* Old-school; update flags. */
> + if (putu16(&vrh->vring.used->flags, 0) != 0) {
> + vringh_bad("Clearing used flags %p",
> + &vrh->vring.used->flags);
> + return false;
> + }
> + } else {
> + if (putu16(&vring_avail_event(&vrh->vring),
> + vrh->last_avail_idx) != 0) {
> + vringh_bad("Updating avail event index %p",
> + &vring_avail_event(&vrh->vring));
> + return false;
> + }
> + }
> +
> + /* They could have slipped one in as we were doing that: make
> + * sure it's written, then check again. */
> + virtio_mb(vrh->weak_barriers);
> +
> + if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
> + vringh_bad("Failed to check avail idx at %p",
> + &vrh->vring.avail->idx);
> + return false;
> + }
> +
> + /* This is so unlikely, we just leave notifications enabled. */
> + return avail != vrh->last_avail_idx;
> +}
> +
> +static inline void __vringh_notify_disable(struct vringh *vrh,
> + int (*putu16)(u16 *p, u16 val))
> +{
> + /* Already disabled? */
> + if (!vrh->listening)
> + return;
> +
> + vrh->listening = false;
> + if (!vrh->event_indices) {
> + /* Old-school; update flags. */
> + if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
> + vringh_bad("Setting used flags %p",
> + &vrh->vring.used->flags);
> + }
> + }
> +}
> +
> +/* Userspace access helpers. */
> +static inline int getu16_user(u16 *val, const u16 *p)
> +{
> + return get_user(*val, (__force u16 __user *)p);
> +}
> +
> +static inline int putu16_user(u16 *p, u16 val)
> +{
> + return put_user(val, (__force u16 __user *)p);
> +}
> +
> +static inline int getdesc_user(struct vring_desc *dst,
> + const struct vring_desc *src)
> +{
> + return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +static inline int putused_user(struct vring_used_elem *dst,
> + const struct vring_used_elem *s)
> +{
> + return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
> + ? 0 : -EFAULT;
> +}
> +
> +static inline int xfer_from_user(void *src, void *dst, size_t len)
> +{
> + return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +static inline int xfer_to_user(void *dst, void *src, size_t len)
> +{
> + return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +/**
> + * vringh_init_user - initialize a vringh for a userspace vring.
> + * @vrh: the vringh to initialize.
> + * @features: the feature bits for this ring.
> + * @num: the number of elements.
> + * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @desc: the userpace descriptor pointer.
> + * @avail: the userpace avail pointer.
> + * @used: the userpace used pointer.
> + *
> + * Returns an error if num is invalid: you should check pointers
> + * yourself!
> + */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used)
> +{
> + /* Sane power of 2 please! */
> + if (!num || num > 0xffff || (num & (num - 1))) {
> + vringh_bad("Bad ring size %zu", num);
> + return -EINVAL;
> + }
> +
> + vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
> + vrh->weak_barriers = weak_barriers;
> + vrh->listening = false;
> + vrh->last_avail_idx = 0;
> + vrh->last_used_idx = 0;
> + vrh->vring.num = num;
> + vrh->vring.desc = (__force struct vring_desc *)desc;
> + vrh->vring.avail = (__force struct vring_avail *)avail;
> + vrh->vring.used = (__force struct vring_used *)used;
> + return 0;
> +}
> +
> +/**
> + * vringh_getdesc_user - get next available descriptor from userspace ring.
> + * @vrh: the userspace vring.
> + * @riov: where to put the readable descriptors.
> + * @wiov: where to put the writable descriptors.
> + * @getrange: function to call to check ranges.
> + * @head: head index we received, for passing to vringh_complete_user().
> + * @gfp: flags for allocating larger riov/wiov.
> + *
> + * Returns 0 if there was no descriptor, 1 if there was, or -errno.
> + *
> + * If it returns 1, riov->allocated and wiov->allocated indicate if you
> + * have to kfree riov->iov and wiov->iov respectively.
> + */
> +int vringh_getdesc_user(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + u16 *head,
> + gfp_t gfp)
> +{
> + int err;
> +
> + err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
> + if (err < 0)
> + return err;
> +
> + /* Empty... */
> + if (err == vrh->vring.num)
> + return 0;
> +
> + *head = err;
> + err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
> + if (err)
> + return err;
> +
> + return 1;
> +}
> +
> +/**
> + * vringh_iov_pull_user - copy bytes from vring_iov.
> + * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
> +{
> + return vringh_iov_xfer(riov, dst, len, xfer_from_user);
> +}
> +
> +/**
> + * vringh_iov_push_user - copy bytes into vring_iov.
> + * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len)
> +{
> + return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
> +}
> +
> +/**
> + * vringh_abandon_user - we've decided not to handle the descriptor(s).
> + * @vrh: the vring.
> + * @num: the number of descriptors to put back (ie. num
> + * vringh_get_user() to undo).
> + *
> + * The next vringh_get_user() will return the old descriptor(s) again.
> + */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num)
> +{
> + /* We only update vring_avail_event(vr) when we want to be notified,
> + * so we haven't changed that yet. */
> + vrh->last_avail_idx -= num;
> +}
> +
> +/**
> + * vringh_complete_user - we've finished with descriptor, publish it.
> + * @vrh: the vring.
> + * @head: the head as filled in by vringh_getdesc_user.
> + * @len: the length of data we have written.
> + * @notify: set if we should notify the other side, otherwise left alone.
> + */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> + bool *notify)
> +{
> + return __vringh_complete(vrh, head, len,
> + getu16_user, putu16_user, putused_user,
> + notify);
> +}
> +
> +/**
> + * vringh_notify_enable_user - we want to know if something changes.
> + * @vrh: the vring.
> + *
> + * This always enables notifications, but returns true if there are
> + * now more buffers available in the vring.
> + */
> +bool vringh_notify_enable_user(struct vringh *vrh)
> +{
> + return __vringh_notify_enable(vrh, getu16_user, putu16_user);
> +}
> +
> +/**
> + * vringh_notify_disable_user - don't tell us if something changes.
> + * @vrh: the vring.
> + *
> + * This is our normal running state: we disable and then only enable when
> + * we're going to sleep.
> + */
> +void vringh_notify_disable_user(struct vringh *vrh)
> +{
> + __vringh_notify_disable(vrh, putu16_user);
> +}
> +
> +/* Kernelspace access helpers. */
> +static inline int getu16_kern(u16 *val, const u16 *p)
> +{
> + *val = *p;
> + return 0;
> +}
> +
> +static inline int putu16_kern(u16 *p, u16 val)
> +{
> + *p = val;
> + return 0;
> +}
> +
> +static inline int getdesc_kern(struct vring_desc *dst,
> + const struct vring_desc *src)
> +{
> + *dst = *src;
> + return 0;
> +}
> +
> +static inline int putused_kern(struct vring_used_elem *dst,
> + const struct vring_used_elem *s)
> +{
> + *dst = *s;
> + return 0;
> +}
> +
> +static inline int xfer_kern(void *src, void *dst, size_t len)
> +{
> + memcpy(dst, src, len);
> + return 0;
> +}
> +
> +static inline bool noop_getrange(u64 addr, struct vringh_range *r)
> +{
> + r->start = 0;
> + r->end_incl = -1ULL;
> + r->offset = 0;
> + return true;
> +}
> +
> +/**
> + * vringh_init_kern - initialize a vringh for a kernelspace vring.
> + * @vrh: the vringh to initialize.
> + * @features: the feature bits for this ring.
> + * @num: the number of elements.
> + * @weak_barriers: true if we only need memory barriers, not I/O.
> + * @desc: the userpace descriptor pointer.
> + * @avail: the userpace avail pointer.
> + * @used: the userpace used pointer.
> + *
> + * Returns an error if num is invalid.
> + */
> +int vringh_init_kern(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc *desc,
> + struct vring_avail *avail,
> + struct vring_used *used)
> +{
> + /* Sane power of 2 please! */
> + if (!num || num > 0xffff || (num & (num - 1))) {
> + vringh_bad("Bad ring size %zu", num);
> + return -EINVAL;
> + }
> +
> + vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
> + vrh->weak_barriers = weak_barriers;
> + vrh->listening = false;
> + vrh->last_avail_idx = 0;
> + vrh->last_used_idx = 0;
> + vrh->vring.num = num;
> + vrh->vring.desc = desc;
> + vrh->vring.avail = avail;
> + vrh->vring.used = used;
> + return 0;
> +}
> +
> +/**
> + * vringh_getdesc_kern - get next available descriptor from kernelspace ring.
> + * @vrh: the kernelspace vring.
> + * @riov: where to put the readable descriptors.
> + * @wiov: where to put the writable descriptors.
> + * @head: head index we received, for passing to vringh_complete_kern().
> + * @gfp: flags for allocating larger riov/wiov.
> + *
> + * Returns 0 if there was no descriptor, 1 if there was, or -errno.
> + *
> + * If it returns 1, riov->allocated and wiov->allocated indicate if you
> + * have to kfree riov->iov and wiov->iov respectively.
> + */
> +int vringh_getdesc_kern(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + u16 *head,
> + gfp_t gfp)
> +{
> + int err;
> +
> + err = __vringh_get_head(vrh, getu16_kern, &vrh->last_avail_idx);
> + if (err < 0)
> + return err;
> +
> + /* Empty... */
> + if (err == vrh->vring.num)
> + return 0;
> +
> + *head = err;
> + err = __vringh_iov(vrh, *head, riov, wiov, noop_getrange,
> + gfp, getdesc_kern);
> + if (err)
> + return err;
> +
> + return 1;
> +}
> +
> +/**
> + * vringh_iov_pull_kern - copy bytes from vring_iov.
> + * @riov: the riov as passed to vringh_getdesc_kern() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len)
> +{
> + return vringh_iov_xfer(riov, dst, len, xfer_kern);
> +}
> +
> +/**
> + * vringh_iov_push_kern - copy bytes into vring_iov.
> + * @wiov: the wiov as passed to vringh_getdesc_kern() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_push_kern(struct vringh_iov *wiov,
> + const void *src, size_t len)
> +{
> + return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
> +}
> +
> +/**
> + * vringh_abandon_kern - we've decided not to handle the descriptor(s).
> + * @vrh: the vring.
> + * @num: the number of descriptors to put back (ie. num
> + * vringh_get_kern() to undo).
> + *
> + * The next vringh_get_kern() will return the old descriptor(s) again.
> + */
> +void vringh_abandon_kern(struct vringh *vrh, unsigned int num)
> +{
> + /* We only update vring_avail_event(vr) when we want to be notified,
> + * so we haven't changed that yet. */
> + vrh->last_avail_idx -= num;
> +}
> +
> +/**
> + * vringh_complete_kern - we've finished with descriptor, publish it.
> + * @vrh: the vring.
> + * @head: the head as filled in by vringh_getdesc_kern.
> + * @len: the length of data we have written.
> + * @notify: set if we should notify the other side, otherwise left alone.
> + */
> +int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len,
> + bool *notify)
> +{
> + return __vringh_complete(vrh, head, len,
> + getu16_kern, putu16_kern, putused_kern,
> + notify);
> +}
> +
> +/**
> + * vringh_notify_enable_kern - we want to know if something changes.
> + * @vrh: the vring.
> + *
> + * This always enables notifications, but returns true if there are
> + * now more buffers available in the vring.
> + */
> +bool vringh_notify_enable_kern(struct vringh *vrh)
> +{
> + return __vringh_notify_enable(vrh, getu16_kern, putu16_kern);
> +}
> +
> +/**
> + * vringh_notify_disable_kern - don't tell us if something changes.
> + * @vrh: the vring.
> + *
> + * This is our normal running state: we disable and then only enable when
> + * we're going to sleep.
> + */
> +void vringh_notify_disable_kern(struct vringh *vrh)
> +{
> + __vringh_notify_disable(vrh, putu16_kern);
> +}
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index ffd7e7d..245177c 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -24,27 +24,6 @@
> #include <linux/module.h>
> #include <linux/hrtimer.h>
>
> -/* virtio guest is communicating with a virtual "device" that actually runs on
> - * a host processor. Memory barriers are used to control SMP effects. */
> -#ifdef CONFIG_SMP
> -/* Where possible, use SMP barriers which are more lightweight than mandatory
> - * barriers, because mandatory barriers control MMIO effects on accesses
> - * through relaxed memory I/O windows (which virtio-pci does not use). */
> -#define virtio_mb(vq) \
> - do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
> -#define virtio_rmb(vq) \
> - do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
> -#define virtio_wmb(vq) \
> - do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
> -#else
> -/* We must force memory ordering even if guest is UP since host could be
> - * running on another CPU, but SMP barriers are defined to barrier() in that
> - * configuration. So fall back to mandatory barriers instead. */
> -#define virtio_mb(vq) mb()
> -#define virtio_rmb(vq) rmb()
> -#define virtio_wmb(vq) wmb()
> -#endif
> -
> #ifdef DEBUG
> /* For development, we want to crash whenever the ring is screwed. */
> #define BAD_RING(_vq, fmt, args...) \
> @@ -276,7 +255,7 @@ add_head:
>
> /* Descriptors and available array need to be set before we expose the
> * new available array entries. */
> - virtio_wmb(vq);
> + virtio_wmb(vq->weak_barriers);
> vq->vring.avail->idx++;
> vq->num_added++;
>
> @@ -312,7 +291,7 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> START_USE(vq);
> /* We need to expose available array entries before checking avail
> * event. */
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
>
> old = vq->vring.avail->idx - vq->num_added;
> new = vq->vring.avail->idx;
> @@ -436,7 +415,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> }
>
> /* Only get used array entries after they have been exposed by host. */
> - virtio_rmb(vq);
> + virtio_rmb(vq->weak_barriers);
>
> last_used = (vq->last_used_idx & (vq->vring.num - 1));
> i = vq->vring.used->ring[last_used].id;
> @@ -460,7 +439,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> * the read in the next get_buf call. */
> if (!(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> vring_used_event(&vq->vring) = vq->last_used_idx;
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
> }
>
> #ifdef DEBUG
> @@ -513,7 +492,7 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
> * entry. Always do both to keep code simple. */
> vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> vring_used_event(&vq->vring) = vq->last_used_idx;
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
> if (unlikely(more_used(vq))) {
> END_USE(vq);
> return false;
> @@ -553,7 +532,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> /* TODO: tune this threshold */
> bufs = (u16)(vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
> vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
> - virtio_mb(vq);
> + virtio_mb(vq->weak_barriers);
> if (unlikely((u16)(vq->vring.used->idx - vq->last_used_idx) > bufs)) {
> END_USE(vq);
> return false;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index 63c6ea1..ca3ad41 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -4,6 +4,63 @@
> #include <linux/irqreturn.h>
> #include <uapi/linux/virtio_ring.h>
>
> +/*
> + * Barriers in virtio are tricky. Non-SMP virtio guests can't assume
> + * they're not on an SMP host system, so they need to assume real
> + * barriers. Non-SMP virtio hosts could skip the barriers, but does
> + * anyone care?
> + *
> + * For virtio_pci on SMP, we don't need to order with respect to MMIO
> + * accesses through relaxed memory I/O windows, so smp_mb() et al are
> + * sufficient.
> + *
> + * For using virtio to talk to real devices (eg. other heterogeneous
> + * CPUs) we do need real barriers. In theory, we could be using both
> + * kinds of virtio, so it's a runtime decision, and the branch is
> + * actually quite cheap.
> + */
> +
> +#ifdef CONFIG_SMP
> +static inline void virtio_mb(bool weak_barriers)
> +{
> + if (weak_barriers)
> + smp_mb();
> + else
> + mb();
> +}
> +
> +static inline void virtio_rmb(bool weak_barriers)
> +{
> + if (weak_barriers)
> + smp_rmb();
> + else
> + rmb();
> +}
> +
> +static inline void virtio_wmb(bool weak_barriers)
> +{
> + if (weak_barriers)
> + smp_wmb();
> + else
> + wmb();
> +}
> +#else
> +static inline void virtio_mb(bool weak_barriers)
> +{
> + mb();
> +}
> +
> +static inline void virtio_rmb(bool weak_barriers)
> +{
> + rmb();
> +}
> +
> +static inline void virtio_wmb(bool weak_barriers)
> +{
> + wmb();
> +}
> +#endif
> +
> struct virtio_device;
> struct virtqueue;
>
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> new file mode 100644
> index 0000000..508b5e5
> --- /dev/null
> +++ b/include/linux/vringh.h
> @@ -0,0 +1,115 @@
> +/*
> + * Linux host-side vring helpers; for when the kernel needs to access
> + * someone else's vring.
> + *
> + * Copyright IBM Corporation, 2013.
> + * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Written by: Rusty Russell <rusty@rustcorp.com.au>
> + */
> +#ifndef _LINUX_VRINGH_H
> +#define _LINUX_VRINGH_H
> +#include <uapi/linux/virtio_ring.h>
> +#include <uapi/linux/uio.h>
> +#include <asm/barrier.h>
> +
> +/* virtio_ring with information needed for host access. */
> +struct vringh {
> + /* Guest publishes used event idx (note: we always do). */
> + bool event_indices;
> +
> + /* Have we told the other end we want to be notified? */
> + bool listening;
> +
> + /* Can we get away with weak barriers? */
> + bool weak_barriers;
> +
> + /* Last available index we saw (ie. where we're up to). */
> + u16 last_avail_idx;
> +
> + /* Last index we used. */
> + u16 last_used_idx;
> +
> + /* The vring (note: it may contain user pointers!) */
> + struct vring vring;
> +};
> +
> +/* The memory the vring can access, and what offset to apply. */
> +struct vringh_range {
> + u64 start, end_incl;
> + u64 offset;
> +};
> +
> +/* All the information about an iovec. */
> +struct vringh_iov {
> + struct iovec *iov;
> + unsigned i, max;
> + bool allocated;
> +};
> +
> +/* Helpers for userspace vrings. */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used);
> +
> +/* Convert a descriptor into iovecs. */
> +int vringh_getdesc_user(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + u16 *head,
> + gfp_t gfp);
> +
> +/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
> +
> +/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len);
> +
> +/* Mark a descriptor as used. Sets notify if you should fire eventfd. */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> + bool *notify);
> +
> +/* Pretend we've never seen descriptor (for easy error handling). */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num);
> +
> +/* Helpers for kernelspace vrings. */
> +int vringh_init_kern(struct vringh *vrh, u32 features,
> + unsigned int num, bool weak_barriers,
> + struct vring_desc *desc,
> + struct vring_avail *avail,
> + struct vring_used *used);
> +
> +int vringh_getdesc_kern(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + u16 *head,
> + gfp_t gfp);
> +
> +ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len);
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len);
> +void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
> +int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len, bool *notify);
> +
> +bool vringh_notify_enable_kern(struct vringh *vrh);
> +void vringh_notify_disable_kern(struct vringh *vrh);
> +
> +#endif /* _LINUX_VRINGH_H */
>
--
Asias
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-22 8:12 ` Asias He
@ 2013-01-23 1:56 ` Rusty Russell
0 siblings, 0 replies; 30+ messages in thread
From: Rusty Russell @ 2013-01-23 1:56 UTC (permalink / raw)
To: Asias He; +Cc: Michael S . Tsirkin, virtualization
Asias He <asias@redhat.com> writes:
> On 01/17/2013 06:29 PM, Rusty Russell wrote:
>> Getting use of virtio rings correct is tricky, and a recent patch saw
>> an implementation of in-kernel rings (as separate from userspace).
>>
>> This patch attempts to abstract the business of dealing with the
>> virtio ring layout from the access (userspace or direct); to do this,
>> we use function pointers, which gcc inlines correctly.
>>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>> ---
>> drivers/Makefile | 2 +-
>> drivers/vhost/Kconfig | 8 +
>> drivers/vhost/Kconfig.tcm | 1 +
>> drivers/vhost/Makefile | 2 +
>> drivers/vhost/vringh.c | 818 ++++++++++++++++++++++++++++++++++++++++++
>> drivers/virtio/virtio_ring.c | 33 +-
>> include/linux/virtio_ring.h | 57 +++
>
>
> Why vringh_notify_enable_user() and vringh_notify_disable_user() are not
> declared in include/linux/virtio_ring.h? Missed that?
Yes, I did... I've fixed it in my vringh branch, where I'm doing
development from now on.
Thanks!
Rusty.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-01-17 10:29 [PATCH 1/6] virtio_host: host-side implementation of virtio rings Rusty Russell
` (6 preceding siblings ...)
2013-01-22 8:12 ` Asias He
@ 2013-02-04 20:29 ` Sjur Brændeland
2013-02-04 21:44 ` Rusty Russell
7 siblings, 1 reply; 30+ messages in thread
From: Sjur Brændeland @ 2013-02-04 20:29 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S . Tsirkin, virtualization
[-- Attachment #1.1: Type: text/plain, Size: 901 bytes --]
Hi Rusty,
On Thu, Jan 17, 2013 at 11:29 AM, Rusty Russell <rusty@rustcorp.com.au>wrote:
> Getting use of virtio rings correct is tricky, and a recent patch saw
> an implementation of in-kernel rings (as separate from userspace).
>
> This patch attempts to abstract the business of dealing with the
> virtio ring layout from the access (userspace or direct); to do this,
> we use function pointers, which gcc inlines correctly.
I have been using your patches for a while in my test setup without any
issues with vringh. The only thing I'm missing is export of symbols.
My current caif_virtio driver expects vringh to be a module exporting
symbols. Is this something you are planning to add?
I guess my vringh related stuff should go into your tree together with your
vringh patches...
Would you be willing to take this via your tree, provided that I get acks
from the right people?
Regards,
Sjur
[-- Attachment #1.2: Type: text/html, Size: 1332 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-02-04 20:29 ` Sjur Brændeland
@ 2013-02-04 21:44 ` Rusty Russell
2013-02-12 18:58 ` Sjur Brændeland
0 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2013-02-04 21:44 UTC (permalink / raw)
To: Sjur Brændeland; +Cc: Michael S . Tsirkin, virtualization
Sjur Brændeland <sjurbren@gmail.com> writes:
> Hi Rusty,
>
> On Thu, Jan 17, 2013 at 11:29 AM, Rusty Russell <rusty@rustcorp.com.au>wrote:
>
>> Getting use of virtio rings correct is tricky, and a recent patch saw
>> an implementation of in-kernel rings (as separate from userspace).
>>
>> This patch attempts to abstract the business of dealing with the
>> virtio ring layout from the access (userspace or direct); to do this,
>> we use function pointers, which gcc inlines correctly.
>
>
> I have been using your patches for a while in my test setup without any
> issues with vringh. The only thing I'm missing is export of symbols.
> My current caif_virtio driver expects vringh to be a module exporting
> symbols. Is this something you are planning to add?
Yes, sure. There may be some more minor changes, but nothing radical.
> I guess my vringh related stuff should go into your tree together with your
> vringh patches...
> Would you be willing to take this via your tree, provided that I get acks
> from the right people?
Yes please. Now I have set up a test rig for vhost (unfortunately not
with 10Ge) I can make progress on vhost adaption.
Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-02-04 21:44 ` Rusty Russell
@ 2013-02-12 18:58 ` Sjur Brændeland
2013-02-13 10:25 ` Rusty Russell
0 siblings, 1 reply; 30+ messages in thread
From: Sjur Brændeland @ 2013-02-12 18:58 UTC (permalink / raw)
To: Rusty Russell
Cc: Linus Walleij, Michael S . Tsirkin, Erwan.Yvin, virtualization
Hi Rusty,
On Mon, Feb 4, 2013 at 10:44 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Sjur Brændeland <sjurbren@gmail.com> writes:
...
>> I guess my vringh related stuff should go into your tree together with your
>> vringh patches...
>> Would you be willing to take this via your tree, provided that I get acks
>> from the right people?
>
> Yes please. Now I have set up a test rig for vhost (unfortunately not
> with 10Ge) I can make progress on vhost adaption.
>
I have one question, are you planning to include vringh in your
virtio-next branch
for the 3.9 merge window or some time later?
Regards,
Sjur
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-02-12 18:58 ` Sjur Brændeland
@ 2013-02-13 10:25 ` Rusty Russell
2013-02-14 14:54 ` Sjur Brændeland
0 siblings, 1 reply; 30+ messages in thread
From: Rusty Russell @ 2013-02-13 10:25 UTC (permalink / raw)
To: Sjur Brændeland
Cc: Linus Walleij, Michael S . Tsirkin, Erwan.Yvin, virtualization
Sjur Brændeland <sjurbren@gmail.com> writes:
> Hi Rusty,
>
> On Mon, Feb 4, 2013 at 10:44 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> Sjur Brændeland <sjurbren@gmail.com> writes:
> ...
>>> I guess my vringh related stuff should go into your tree together with your
>>> vringh patches...
>>> Would you be willing to take this via your tree, provided that I get acks
>>> from the right people?
>>
>> Yes please. Now I have set up a test rig for vhost (unfortunately not
>> with 10Ge) I can make progress on vhost adaption.
>>
>
> I have one question, are you planning to include vringh in your
> virtio-next branch
> for the 3.9 merge window or some time later?
If you're eager with the CAIF stuff, I'll merge it for this merge window
(we may still rework stuff, but I don't want to block you).
Otherwise, it'll probably wait.
Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/6] virtio_host: host-side implementation of virtio rings.
2013-02-13 10:25 ` Rusty Russell
@ 2013-02-14 14:54 ` Sjur Brændeland
0 siblings, 0 replies; 30+ messages in thread
From: Sjur Brændeland @ 2013-02-14 14:54 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S . Tsirkin, Linus Walleij, Erwan.Yvin, virtualization,
David S. Miller
Hi Rusty,
On Wed, Feb 13, 2013 at 11:25 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Sjur Brændeland <sjurbren@gmail.com> writes:
>> On Mon, Feb 4, 2013 at 10:44 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>>> Sjur Brændeland <sjurbren@gmail.com> writes:
>> ...
>>>> I guess my vringh related stuff should go into your tree together with your
>>>> vringh patches...
>>>> Would you be willing to take this via your tree, provided that I get acks
>>>> from the right people?
>>>
>>> Yes please. Now I have set up a test rig for vhost (unfortunately not
>>> with 10Ge) I can make progress on vhost adaption.
>>>
>>
>> I have one question, are you planning to include vringh in your
>> virtio-next branch
>> for the 3.9 merge window or some time later?
>
> If you're eager with the CAIF stuff, I'll merge it for this merge window
> (we may still rework stuff, but I don't want to block you).
>
> Otherwise, it'll probably wait.
Yes, I would prefer to get CAIF-virtio merged now unless it creates a
a lot of trouble for you. But I'm not sure I'll manage to get ack from Ohad
in time though. He said he would be busy for the next week. I need
an ack from David Miller as well, I'll resend the patch to him if you
decide to go ahead and merge this.
Thanks,
Sjur
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 30+ messages in thread