* [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
@ 2013-03-05 13:51 sjur.brandeland
2013-03-06 4:42 ` Rusty Russell
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
0 siblings, 2 replies; 16+ messages in thread
From: sjur.brandeland @ 2013-03-05 13:51 UTC (permalink / raw)
To: Ohad Ben-Cohen, Rusty Russell
Cc: sjur, Linus Walleij, Sjur Brændeland, Erwan Yvin,
virtualization
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Add wrappers for the host vrings to support loose
coupling between the virtio device and driver.
The functions find_vrhs() and del_vrhs() are added to
struct virtio_config_ops to manage the host vrings.
The function vringh_notify() is added so the guest
can be kicked when buffers are added to the used-ring.
This enables the virtio drivers to manage the virtio rings
without knowledge of how the host vrings are managed.
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
include/linux/virtio_config.h | 13 +++++++++++++
include/linux/vringh.h | 13 +++++++++++++
2 files changed, 26 insertions(+), 0 deletions(-)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 29b9104..88dd5ae 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -51,7 +51,17 @@
* This returns a pointer to the bus name a la pci_name from which
* the caller can then copy.
* @set_vq_affinity: set the affinity for a virtqueue.
+ * @find_vrhs: find the host vrings and instantiate them
+ * vdev: the virtio_device
+ * nhvrs: the number of host vrings to find
+ * hvrs: on success, includes new host vrings
+ * callbacks: array of driver callbacks, for each host vring
+ * include a NULL entry for vqs that do not need a callback
+ * Returns 0 on success or error status
+ * @del_vrhs: free the host vrings found by find_vrhs().
*/
+struct vringh;
+typedef void vrh_callback_t(struct virtio_device *, struct vringh *);
typedef void vq_callback_t(struct virtqueue *);
struct virtio_config_ops {
void (*get)(struct virtio_device *vdev, unsigned offset,
@@ -70,6 +80,9 @@ struct virtio_config_ops {
void (*finalize_features)(struct virtio_device *vdev);
const char *(*bus_name)(struct virtio_device *vdev);
int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
+ int (*find_vrhs)(struct virtio_device *vdev, unsigned nhvrs,
+ struct vringh *vrhs[], vrh_callback_t *callbacks[]);
+ void (*del_vrhs)(struct virtio_device *vdev);
};
/* If driver didn't advertise the feature, it will never appear. */
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index ab41185..8156f51 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -50,6 +50,12 @@ struct vringh {
/* The vring (note: it may contain user pointers!) */
struct vring vring;
+
+ /* The function to call when buffers are available */
+ void (*notify)(struct vringh *);
+
+ /* A pointer for the vringh clients to use. */
+ void *priv;
};
/* The memory the vring can access, and what offset to apply. */
@@ -182,4 +188,11 @@ void vringh_notify_disable_kern(struct vringh *vrh);
int vringh_need_notify_kern(struct vringh *vrh);
+/* Notify the guest about buffers added to the used ring */
+static inline void vringh_notify(struct vringh *vrh)
+{
+ if (vrh->notify)
+ vrh->notify(vrh);
+}
+
#endif /* _LINUX_VRINGH_H */
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-05 13:51 [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config sjur.brandeland
@ 2013-03-06 4:42 ` Rusty Russell
2013-03-06 10:50 ` Sjur Brændeland
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
1 sibling, 1 reply; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 4:42 UTC (permalink / raw)
To: Ohad Ben-Cohen
Cc: sjur, Linus Walleij, Sjur Brændeland, Erwan Yvin,
virtualization
sjur.brandeland@stericsson.com writes:
> From: Sjur Brændeland <sjur.brandeland@stericsson.com>
>
> Add wrappers for the host vrings to support loose
> coupling between the virtio device and driver.
>
> The functions find_vrhs() and del_vrhs() are added to
> struct virtio_config_ops to manage the host vrings.
> The function vringh_notify() is added so the guest
> can be kicked when buffers are added to the used-ring.
>
> This enables the virtio drivers to manage the virtio rings
> without knowledge of how the host vrings are managed.
Hmm, this is a bit weird.
> diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> index 29b9104..88dd5ae 100644
> --- a/include/linux/virtio_config.h
> +++ b/include/linux/virtio_config.h
> @@ -51,7 +51,17 @@
> * This returns a pointer to the bus name a la pci_name from which
> * the caller can then copy.
> * @set_vq_affinity: set the affinity for a virtqueue.
> + * @find_vrhs: find the host vrings and instantiate them
> + * vdev: the virtio_device
> + * nhvrs: the number of host vrings to find
> + * hvrs: on success, includes new host vrings
> + * callbacks: array of driver callbacks, for each host vring
> + * include a NULL entry for vqs that do not need a callback
> + * Returns 0 on success or error status
> + * @del_vrhs: free the host vrings found by find_vrhs().
> */
> +struct vringh;
> +typedef void vrh_callback_t(struct virtio_device *, struct vringh *);
> typedef void vq_callback_t(struct virtqueue *);
> struct virtio_config_ops {
> void (*get)(struct virtio_device *vdev, unsigned offset,
> @@ -70,6 +80,9 @@ struct virtio_config_ops {
> void (*finalize_features)(struct virtio_device *vdev);
> const char *(*bus_name)(struct virtio_device *vdev);
> int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
> + int (*find_vrhs)(struct virtio_device *vdev, unsigned nhvrs,
> + struct vringh *vrhs[], vrh_callback_t *callbacks[]);
> + void (*del_vrhs)(struct virtio_device *vdev);
> };
>
> /* If driver didn't advertise the feature, it will never appear. */
It's weird that you conflate the host and guest ring sides in rpmsg, but
that might make sense if they're really bound together. However, in
general they are not: it's normal to be a guest or host, not both.
This implies that you need a struct vringh_config, to put this in.
> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
> index ab41185..8156f51 100644
> --- a/include/linux/vringh.h
> +++ b/include/linux/vringh.h
> @@ -50,6 +50,12 @@ struct vringh {
>
> /* The vring (note: it may contain user pointers!) */
> struct vring vring;
> +
> + /* The function to call when buffers are available */
> + void (*notify)(struct vringh *);
> +
> + /* A pointer for the vringh clients to use. */
> + void *priv;
> };
Since the caller allocates the vringh, can it not use container_of()
instead of a priv pointer?
Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 16+ messages in thread
* [FYI] vringh fixes.
2013-03-05 13:51 [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config sjur.brandeland
2013-03-06 4:42 ` Rusty Russell
@ 2013-03-06 4:46 ` Rusty Russell
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
1 sibling, 2 replies; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 4:46 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
Hi all,
More benchmarking using vringh_test, and I finally figured out that my
changes to virtio_ring had no effect because we were actually
bottlenecked on the vringh side...
Here are the changes I've ended up with, for your reading pleasure.
They are all in my pending-rebases git tree, and I'll fold them together
before putting them into virtio-next (I'm waiting for Sjur's notify
changes).
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/2]
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
@ 2013-03-06 4:53 ` Rusty Russell
2013-03-06 4:54 ` [PATCH 2/2] tools/virtio: make barriers stronger Rusty Russell
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
1 sibling, 1 reply; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 4:53 UTC (permalink / raw)
To: mst; +Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
tools/virtio: separate headers more.
This makes them a bit more like the kernel headers, so we can include more
real kernel headers in our tests.
In addition this means that we don't break tools/virtio with the next
patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Changes since last version: use ACCESS_ONCE() for "userspace" reads.
This semantic is assumed in kernel userspace access, since otherwise
the compiler might decide to re-grab a value after verification,
introducing a security hole).
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
new file mode 100644
index 0000000..aff61e1
--- /dev/null
+++ b/tools/virtio/asm/barrier.h
@@ -0,0 +1,14 @@
+#if defined(__i386__) || defined(__x86_64__)
+#define barrier() asm volatile("" ::: "memory")
+#define mb() __sync_synchronize()
+
+#define smp_mb() mb()
+# define smp_rmb() barrier()
+# define smp_wmb() barrier()
+/* Weak barriers should be used. If not - it's a bug */
+# define rmb() abort()
+# define wmb() abort()
+#else
+#error Please fill in barrier macros
+#endif
+
diff --git a/tools/virtio/linux/bug.h b/tools/virtio/linux/bug.h
new file mode 100644
index 0000000..fb94f07
--- /dev/null
+++ b/tools/virtio/linux/bug.h
@@ -0,0 +1,10 @@
+#ifndef BUG_H
+#define BUG_H
+
+#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
+
+#define BUILD_BUG_ON(x)
+
+#define BUG() abort()
+
+#endif /* BUG_H */
diff --git a/tools/virtio/linux/err.h b/tools/virtio/linux/err.h
new file mode 100644
index 0000000..e32eff8
--- /dev/null
+++ b/tools/virtio/linux/err.h
@@ -0,0 +1,26 @@
+#ifndef ERR_H
+#define ERR_H
+#define MAX_ERRNO 4095
+
+#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
+
+static inline void * __must_check ERR_PTR(long error)
+{
+ return (void *) error;
+}
+
+static inline long __must_check PTR_ERR(const void *ptr)
+{
+ return (long) ptr;
+}
+
+static inline long __must_check IS_ERR(const void *ptr)
+{
+ return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline long __must_check IS_ERR_OR_NULL(const void *ptr)
+{
+ return !ptr || IS_ERR_VALUE((unsigned long)ptr);
+}
+#endif /* ERR_H */
diff --git a/tools/virtio/linux/export.h b/tools/virtio/linux/export.h
new file mode 100644
index 0000000..7311d32
--- /dev/null
+++ b/tools/virtio/linux/export.h
@@ -0,0 +1,5 @@
+#define EXPORT_SYMBOL(sym)
+#define EXPORT_SYMBOL_GPL(sym)
+#define EXPORT_SYMBOL_GPL_FUTURE(sym)
+#define EXPORT_UNUSED_SYMBOL(sym)
+#define EXPORT_UNUSED_SYMBOL_GPL(sym)
diff --git a/tools/virtio/linux/irqreturn.h b/tools/virtio/linux/irqreturn.h
new file mode 100644
index 0000000..a3c4e7b
--- /dev/null
+++ b/tools/virtio/linux/irqreturn.h
@@ -0,0 +1 @@
+#include "../../../include/linux/irqreturn.h"
diff --git a/tools/virtio/linux/kernel.h b/tools/virtio/linux/kernel.h
new file mode 100644
index 0000000..fba7059
--- /dev/null
+++ b/tools/virtio/linux/kernel.h
@@ -0,0 +1,112 @@
+#ifndef KERNEL_H
+#define KERNEL_H
+#include <stdbool.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <stdarg.h>
+
+#include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bug.h>
+#include <errno.h>
+#include <unistd.h>
+#include <asm/barrier.h>
+
+#define CONFIG_SMP
+
+#define PAGE_SIZE getpagesize()
+#define PAGE_MASK (~(PAGE_SIZE-1))
+
+typedef unsigned long long dma_addr_t;
+typedef size_t __kernel_size_t;
+
+struct page {
+ unsigned long long dummy;
+};
+
+/* Physical == Virtual */
+#define virt_to_phys(p) ((unsigned long)p)
+#define phys_to_virt(a) ((void *)(unsigned long)(a))
+/* Page address: Virtual / 4K */
+#define page_to_phys(p) ((dma_addr_t)(unsigned long)(p))
+#define virt_to_page(p) ((struct page *)((unsigned long)p & PAGE_MASK))
+
+#define offset_in_page(p) (((unsigned long)p) % PAGE_SIZE)
+
+#define __printf(a,b) __attribute__((format(printf,a,b)))
+
+typedef enum {
+ GFP_KERNEL,
+ GFP_ATOMIC,
+ __GFP_HIGHMEM,
+ __GFP_HIGH
+} gfp_t;
+
+#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+
+extern void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static inline void *kmalloc(size_t s, gfp_t gfp)
+{
+ if (__kmalloc_fake)
+ return __kmalloc_fake;
+ return malloc(s);
+}
+
+static inline void kfree(void *p)
+{
+ if (p >= __kfree_ignore_start && p < __kfree_ignore_end)
+ return;
+ free(p);
+}
+
+static inline void *krealloc(void *p, size_t s, gfp_t gfp)
+{
+ return realloc(p, s);
+}
+
+
+static inline unsigned long __get_free_page(gfp_t gfp)
+{
+ void *p;
+
+ posix_memalign(&p, PAGE_SIZE, PAGE_SIZE);
+ return (unsigned long)p;
+}
+
+static inline void free_page(unsigned long addr)
+{
+ free((void *)addr);
+}
+
+#define container_of(ptr, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+ (type *)( (char *)__mptr - offsetof(type,member) );})
+
+#define uninitialized_var(x) x = x
+
+# ifndef likely
+# define likely(x) (__builtin_expect(!!(x), 1))
+# endif
+# ifndef unlikely
+# define unlikely(x) (__builtin_expect(!!(x), 0))
+# endif
+
+#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#ifdef DEBUG
+#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#else
+#define pr_debug(format, ...) do {} while (0)
+#endif
+#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
+#endif /* KERNEL_H */
diff --git a/tools/virtio/linux/module.h b/tools/virtio/linux/module.h
index e69de29..3039a7e 100644
--- a/tools/virtio/linux/module.h
+++ b/tools/virtio/linux/module.h
@@ -0,0 +1 @@
+#include <linux/export.h>
diff --git a/tools/virtio/linux/printk.h b/tools/virtio/linux/printk.h
new file mode 100644
index 0000000..9f2423b
--- /dev/null
+++ b/tools/virtio/linux/printk.h
@@ -0,0 +1,4 @@
+#include "../../../include/linux/kern_levels.h"
+
+#define printk printf
+#define vprintk vprintf
diff --git a/tools/virtio/linux/ratelimit.h b/tools/virtio/linux/ratelimit.h
new file mode 100644
index 0000000..dcce172
--- /dev/null
+++ b/tools/virtio/linux/ratelimit.h
@@ -0,0 +1,4 @@
+#define DEFINE_RATELIMIT_STATE(name, interval_init, burst_init) int name = 0
+
+#define __ratelimit(x) (*(x))
+
diff --git a/tools/virtio/linux/scatterlist.h b/tools/virtio/linux/scatterlist.h
new file mode 100644
index 0000000..b2cf7d0
--- /dev/null
+++ b/tools/virtio/linux/scatterlist.h
@@ -0,0 +1,173 @@
+#ifndef SCATTERLIST_H
+#define SCATTERLIST_H
+#include <linux/kernel.h>
+
+struct scatterlist {
+ unsigned long page_link;
+ unsigned int offset;
+ unsigned int length;
+ dma_addr_t dma_address;
+};
+
+/* Scatterlist helpers, stolen from linux/scatterlist.h */
+#define sg_is_chain(sg) ((sg)->page_link & 0x01)
+#define sg_is_last(sg) ((sg)->page_link & 0x02)
+#define sg_chain_ptr(sg) \
+ ((struct scatterlist *) ((sg)->page_link & ~0x03))
+
+/**
+ * sg_assign_page - Assign a given page to an SG entry
+ * @sg: SG entry
+ * @page: The page
+ *
+ * Description:
+ * Assign page to sg entry. Also see sg_set_page(), the most commonly used
+ * variant.
+ *
+ **/
+static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
+{
+ unsigned long page_link = sg->page_link & 0x3;
+
+ /*
+ * In order for the low bit stealing approach to work, pages
+ * must be aligned at a 32-bit boundary as a minimum.
+ */
+ BUG_ON((unsigned long) page & 0x03);
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+ BUG_ON(sg_is_chain(sg));
+#endif
+ sg->page_link = page_link | (unsigned long) page;
+}
+
+/**
+ * sg_set_page - Set sg entry to point at given page
+ * @sg: SG entry
+ * @page: The page
+ * @len: Length of data
+ * @offset: Offset into page
+ *
+ * Description:
+ * Use this function to set an sg entry pointing at a page, never assign
+ * the page directly. We encode sg table information in the lower bits
+ * of the page pointer. See sg_page() for looking up the page belonging
+ * to an sg entry.
+ *
+ **/
+static inline void sg_set_page(struct scatterlist *sg, struct page *page,
+ unsigned int len, unsigned int offset)
+{
+ sg_assign_page(sg, page);
+ sg->offset = offset;
+ sg->length = len;
+}
+
+static inline struct page *sg_page(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+ BUG_ON(sg_is_chain(sg));
+#endif
+ return (struct page *)((sg)->page_link & ~0x3);
+}
+
+/*
+ * Loop over each sg element, following the pointer to a new list if necessary
+ */
+#define for_each_sg(sglist, sg, nr, __i) \
+ for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg))
+
+/**
+ * sg_chain - Chain two sglists together
+ * @prv: First scatterlist
+ * @prv_nents: Number of entries in prv
+ * @sgl: Second scatterlist
+ *
+ * Description:
+ * Links @prv@ and @sgl@ together, to form a longer scatterlist.
+ *
+ **/
+static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents,
+ struct scatterlist *sgl)
+{
+ /*
+ * offset and length are unused for chain entry. Clear them.
+ */
+ prv[prv_nents - 1].offset = 0;
+ prv[prv_nents - 1].length = 0;
+
+ /*
+ * Set lowest bit to indicate a link pointer, and make sure to clear
+ * the termination bit if it happens to be set.
+ */
+ prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02;
+}
+
+/**
+ * sg_mark_end - Mark the end of the scatterlist
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ * Marks the passed in sg entry as the termination point for the sg
+ * table. A call to sg_next() on this entry will return NULL.
+ *
+ **/
+static inline void sg_mark_end(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ /*
+ * Set termination bit, clear potential chain bit
+ */
+ sg->page_link |= 0x02;
+ sg->page_link &= ~0x01;
+}
+
+static inline struct scatterlist *sg_next(struct scatterlist *sg)
+{
+#ifdef CONFIG_DEBUG_SG
+ BUG_ON(sg->sg_magic != SG_MAGIC);
+#endif
+ if (sg_is_last(sg))
+ return NULL;
+
+ sg++;
+ if (unlikely(sg_is_chain(sg)))
+ sg = sg_chain_ptr(sg);
+
+ return sg;
+}
+
+static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
+{
+ memset(sgl, 0, sizeof(*sgl) * nents);
+#ifdef CONFIG_DEBUG_SG
+ {
+ unsigned int i;
+ for (i = 0; i < nents; i++)
+ sgl[i].sg_magic = SG_MAGIC;
+ }
+#endif
+ sg_mark_end(&sgl[nents - 1]);
+}
+
+static inline dma_addr_t sg_phys(struct scatterlist *sg)
+{
+ return page_to_phys(sg_page(sg)) + sg->offset;
+}
+
+static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
+ unsigned int buflen)
+{
+ sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
+}
+
+static inline void sg_init_one(struct scatterlist *sg,
+ const void *buf, unsigned int buflen)
+{
+ sg_init_table(sg, 1);
+ sg_set_buf(sg, buf, buflen);
+}
+#endif /* SCATTERLIST_H */
diff --git a/tools/virtio/linux/types.h b/tools/virtio/linux/types.h
new file mode 100644
index 0000000..f8ebb9a
--- /dev/null
+++ b/tools/virtio/linux/types.h
@@ -0,0 +1,28 @@
+#ifndef TYPES_H
+#define TYPES_H
+#include <stdint.h>
+
+#define __force
+#define __user
+#define __must_check
+#define __cold
+
+typedef uint64_t u64;
+typedef int64_t s64;
+typedef uint32_t u32;
+typedef int32_t s32;
+typedef uint16_t u16;
+typedef int16_t s16;
+typedef uint8_t u8;
+typedef int8_t s8;
+
+typedef uint64_t __u64;
+typedef int64_t __s64;
+typedef uint32_t __u32;
+typedef int32_t __s32;
+typedef uint16_t __u16;
+typedef int16_t __s16;
+typedef uint8_t __u8;
+typedef int8_t __s8;
+
+#endif /* TYPES_H */
diff --git a/tools/virtio/linux/uaccess.h b/tools/virtio/linux/uaccess.h
new file mode 100644
index 0000000..0a578fe
--- /dev/null
+++ b/tools/virtio/linux/uaccess.h
@@ -0,0 +1,50 @@
+#ifndef UACCESS_H
+#define UACCESS_H
+extern void *__user_addr_min, *__user_addr_max;
+
+#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+
+static inline void __chk_user_ptr(const volatile void *p, size_t size)
+{
+ assert(p >= __user_addr_min && p + size <= __user_addr_max);
+}
+
+#define put_user(x, ptr) \
+({ \
+ typeof(ptr) __pu_ptr = (ptr); \
+ __chk_user_ptr(__pu_ptr, sizeof(*__pu_ptr)); \
+ ACCESS_ONCE(*(__pu_ptr)) = x; \
+ 0; \
+})
+
+#define get_user(x, ptr) \
+({ \
+ typeof(ptr) __pu_ptr = (ptr); \
+ __chk_user_ptr(__pu_ptr, sizeof(*__pu_ptr)); \
+ x = ACCESS_ONCE(*(__pu_ptr)); \
+ 0; \
+})
+
+static void volatile_memcpy(volatile char *to, const volatile char *from,
+ unsigned long n)
+{
+ while (n--)
+ *(to++) = *(from++);
+}
+
+static inline int copy_from_user(void *to, const void __user volatile *from,
+ unsigned long n)
+{
+ __chk_user_ptr(from, n);
+ volatile_memcpy(to, from, n);
+ return 0;
+}
+
+static inline int copy_to_user(void __user volatile *to, const void *from,
+ unsigned long n)
+{
+ __chk_user_ptr(to, n);
+ volatile_memcpy(to, from, n);
+ return 0;
+}
+#endif /* UACCESS_H */
diff --git a/tools/virtio/linux/uio.h b/tools/virtio/linux/uio.h
new file mode 100644
index 0000000..ecbdf92
--- /dev/null
+++ b/tools/virtio/linux/uio.h
@@ -0,0 +1 @@
+#include "../../../include/linux/uio.h"
diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h
index 390c4cb..e4af659 100644
--- a/tools/virtio/linux/virtio.h
+++ b/tools/virtio/linux/virtio.h
@@ -1,129 +1,7 @@
#ifndef LINUX_VIRTIO_H
#define LINUX_VIRTIO_H
-
-#include <stdbool.h>
-#include <stdlib.h>
-#include <stddef.h>
-#include <stdio.h>
-#include <string.h>
-#include <assert.h>
-
-#include <linux/types.h>
-#include <errno.h>
-
-typedef unsigned long long dma_addr_t;
-
-struct scatterlist {
- unsigned long page_link;
- unsigned int offset;
- unsigned int length;
- dma_addr_t dma_address;
-};
-
-struct page {
- unsigned long long dummy;
-};
-
-#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond))
-
-/* Physical == Virtual */
-#define virt_to_phys(p) ((unsigned long)p)
-#define phys_to_virt(a) ((void *)(unsigned long)(a))
-/* Page address: Virtual / 4K */
-#define virt_to_page(p) ((struct page*)((virt_to_phys(p) / 4096) * \
- sizeof(struct page)))
-#define offset_in_page(p) (((unsigned long)p) % 4096)
-#define sg_phys(sg) ((sg->page_link & ~0x3) / sizeof(struct page) * 4096 + \
- sg->offset)
-static inline void sg_mark_end(struct scatterlist *sg)
-{
- /*
- * Set termination bit, clear potential chain bit
- */
- sg->page_link |= 0x02;
- sg->page_link &= ~0x01;
-}
-static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents)
-{
- memset(sgl, 0, sizeof(*sgl) * nents);
- sg_mark_end(&sgl[nents - 1]);
-}
-static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
-{
- unsigned long page_link = sg->page_link & 0x3;
-
- /*
- * In order for the low bit stealing approach to work, pages
- * must be aligned at a 32-bit boundary as a minimum.
- */
- BUG_ON((unsigned long) page & 0x03);
- sg->page_link = page_link | (unsigned long) page;
-}
-
-static inline void sg_set_page(struct scatterlist *sg, struct page *page,
- unsigned int len, unsigned int offset)
-{
- sg_assign_page(sg, page);
- sg->offset = offset;
- sg->length = len;
-}
-
-static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
- unsigned int buflen)
-{
- sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
-}
-
-static inline void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen)
-{
- sg_init_table(sg, 1);
- sg_set_buf(sg, buf, buflen);
-}
-
-typedef __u16 u16;
-
-typedef enum {
- GFP_KERNEL,
- GFP_ATOMIC,
- __GFP_HIGHMEM,
- __GFP_HIGH
-} gfp_t;
-typedef enum {
- IRQ_NONE,
- IRQ_HANDLED
-} irqreturn_t;
-
-static inline void *kmalloc(size_t s, gfp_t gfp)
-{
- return malloc(s);
-}
-
-static inline void kfree(void *p)
-{
- free(p);
-}
-
-#define container_of(ptr, type, member) ({ \
- const typeof( ((type *)0)->member ) *__mptr = (ptr); \
- (type *)( (char *)__mptr - offsetof(type,member) );})
-
-#define uninitialized_var(x) x = x
-
-# ifndef likely
-# define likely(x) (__builtin_expect(!!(x), 1))
-# endif
-# ifndef unlikely
-# define unlikely(x) (__builtin_expect(!!(x), 0))
-# endif
-
-#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#ifdef DEBUG
-#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#else
-#define pr_debug(format, ...) do {} while (0)
-#endif
-#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
-#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__)
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
/* TODO: empty stubs for now. Broken but enough for virtio_ring.c */
#define list_add_tail(a, b) do {} while (0)
@@ -133,6 +11,7 @@ static inline void kfree(void *p)
#define BITS_PER_BYTE 8
#define BITS_PER_LONG (sizeof(long) * BITS_PER_BYTE)
#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
+
/* TODO: Not atomic as it should be:
* we don't use this for anything important. */
static inline void clear_bit(int nr, volatile unsigned long *addr)
@@ -147,10 +26,6 @@ static inline int test_bit(int nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
-
-/* The only feature we care to support */
-#define virtio_has_feature(dev, feature) \
- test_bit((feature), (dev)->features)
/* end of stubs */
struct virtio_device {
@@ -170,28 +45,9 @@ struct virtqueue {
void *priv;
};
-#define EXPORT_SYMBOL_GPL(__EXPORT_SYMBOL_GPL_name) \
- void __EXPORT_SYMBOL_GPL##__EXPORT_SYMBOL_GPL_name() { \
-}
#define MODULE_LICENSE(__MODULE_LICENSE_value) \
const char *__MODULE_LICENSE_name = __MODULE_LICENSE_value
-#define CONFIG_SMP
-
-#if defined(__i386__) || defined(__x86_64__)
-#define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
-
-#define smp_mb() mb()
-# define smp_rmb() barrier()
-# define smp_wmb() barrier()
-/* Weak barriers should be used. If not - it's a bug */
-# define rmb() abort()
-# define wmb() abort()
-#else
-#error Please fill in barrier macros
-#endif
-
/* Interfaces exported by virtio_ring. */
int virtqueue_add_buf(struct virtqueue *vq,
struct scatterlist sg[],
diff --git a/tools/virtio/linux/virtio_config.h b/tools/virtio/linux/virtio_config.h
new file mode 100644
index 0000000..5049967
--- /dev/null
+++ b/tools/virtio/linux/virtio_config.h
@@ -0,0 +1,6 @@
+#define VIRTIO_TRANSPORT_F_START 28
+#define VIRTIO_TRANSPORT_F_END 32
+
+#define virtio_has_feature(dev, feature) \
+ test_bit((feature), (dev)->features)
+
diff --git a/tools/virtio/linux/virtio_ring.h b/tools/virtio/linux/virtio_ring.h
new file mode 100644
index 0000000..8949c4e
--- /dev/null
+++ b/tools/virtio/linux/virtio_ring.h
@@ -0,0 +1 @@
+#include "../../../include/linux/virtio_ring.h"
diff --git a/tools/virtio/linux/vringh.h b/tools/virtio/linux/vringh.h
new file mode 100644
index 0000000..9348957
--- /dev/null
+++ b/tools/virtio/linux/vringh.h
@@ -0,0 +1 @@
+#include "../../../include/linux/vringh.h"
diff --git a/tools/virtio/uapi/linux/uio.h b/tools/virtio/uapi/linux/uio.h
new file mode 100644
index 0000000..7230e90
--- /dev/null
+++ b/tools/virtio/uapi/linux/uio.h
@@ -0,0 +1 @@
+#include <sys/uio.h>
diff --git a/tools/virtio/uapi/linux/virtio_config.h b/tools/virtio/uapi/linux/virtio_config.h
new file mode 100644
index 0000000..4c86675
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_config.h
@@ -0,0 +1 @@
+#include "../../../../include/uapi/linux/virtio_config.h"
diff --git a/tools/virtio/uapi/linux/virtio_ring.h b/tools/virtio/uapi/linux/virtio_ring.h
new file mode 100644
index 0000000..4d99c78
--- /dev/null
+++ b/tools/virtio/uapi/linux/virtio_ring.h
@@ -0,0 +1,4 @@
+#ifndef VIRTIO_RING_H
+#define VIRTIO_RING_H
+#include "../../../../include/uapi/linux/virtio_ring.h"
+#endif /* VIRTIO_RING_H */
diff --git a/tools/virtio/virtio_test.c b/tools/virtio/virtio_test.c
index faf3f0c..814ae80 100644
--- a/tools/virtio/virtio_test.c
+++ b/tools/virtio/virtio_test.c
@@ -10,11 +10,15 @@
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
+#include <stdbool.h>
#include <linux/vhost.h>
#include <linux/virtio.h>
#include <linux/virtio_ring.h>
#include "../../drivers/vhost/test.h"
+/* Unused */
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+
struct vq_info {
int kick;
int call;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
@ 2013-03-06 4:54 ` Rusty Russell
2013-03-06 10:20 ` Michael S. Tsirkin
0 siblings, 1 reply; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 4:54 UTC (permalink / raw)
To: mst; +Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
In the coming vringh_test, we share an mmap with another userspace process
for testing. This requires real barriers.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
index aff61e1..7a63693 100644
--- a/tools/virtio/asm/barrier.h
+++ b/tools/virtio/asm/barrier.h
@@ -3,8 +3,8 @@
#define mb() __sync_synchronize()
#define smp_mb() mb()
-# define smp_rmb() barrier()
-# define smp_wmb() barrier()
+# define smp_rmb() mb()
+# define smp_wmb() mb()
/* Weak barriers should be used. If not - it's a bug */
# define rmb() abort()
# define wmb() abort()
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 1/3] vringh: host-side implementation of virtio rings (v2)
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
@ 2013-03-06 4:57 ` Rusty Russell
2013-03-06 4:59 ` [PATCH 2/3] vringh: don't flag already listening Rusty Russell
2013-03-06 5:02 ` [PATCH 3/3] tools/virtio: add vring_test (v2) Rusty Russell
1 sibling, 2 replies; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 4:57 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).
This abstracts the business of dealing with the virtio ring layout
from the access (userspace or direct); to do this, we use function
pointers, which gcc inlines correctly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Changed:
Use ACCESS_ONCE in kernel ring accessors, to match user accessor
semantics.
diff --git a/drivers/Makefile b/drivers/Makefile
index dce39a9..72d28d3 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -123,7 +123,7 @@ obj-$(CONFIG_PPC_PS3) += ps3/
obj-$(CONFIG_OF) += of/
obj-$(CONFIG_SSB) += ssb/
obj-$(CONFIG_BCMA) += bcma/
-obj-$(CONFIG_VHOST_NET) += vhost/
+obj-$(CONFIG_VHOST_RING) += vhost/
obj-$(CONFIG_VLYNQ) += vlynq/
obj-$(CONFIG_STAGING) += staging/
obj-y += platform/
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index bf24317..85b773a 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
config VHOST_NET
tristate "Host kernel accelerator for virtio net"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP)
+ select VHOST_RING
---help---
This kernel module can be loaded in host kernel to accelerate
guest networking with virtio_net. Not to be confused with virtio_net
@@ -12,3 +13,10 @@ config VHOST_NET
if STAGING
source "drivers/vhost/Kconfig.tcm"
endif
+
+config VHOST_RING
+ tristate
+ ---help---
+ This option is selected by any driver which needs to access
+ the host side of a virtio ring.
+
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index 7e3aa28..c3a8cfa 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
config TCM_VHOST
tristate "TCM_VHOST fabric module"
depends on TARGET_CORE && EVENTFD && m
+ select VHOST_RING
default n
---help---
Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index a27b053..1d37f5e 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
vhost_net-y := vhost.o net.o
obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
+
+obj-$(CONFIG_VHOST_RING) += vringh.o
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
new file mode 100644
index 0000000..8234f2b
--- /dev/null
+++ b/drivers/vhost/vringh.c
@@ -0,0 +1,1012 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/vringh.h>
+#include <linux/virtio_ring.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+
+static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
+{
+ static DEFINE_RATELIMIT_STATE(vringh_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ if (__ratelimit(&vringh_rs)) {
+ va_list ap;
+ va_start(ap, fmt);
+ printk(KERN_NOTICE "vringh:");
+ vprintk(fmt, ap);
+ va_end(ap);
+ }
+}
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ u16 *last_avail_idx)
+{
+ u16 avail_idx, i, head;
+ int err;
+
+ err = getu16(&avail_idx, &vrh->vring.avail->idx);
+ if (err) {
+ vringh_bad("Failed to access avail idx at %p",
+ &vrh->vring.avail->idx);
+ return err;
+ }
+
+ if (*last_avail_idx == avail_idx)
+ return vrh->vring.num;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ virtio_rmb(vrh->weak_barriers);
+
+ i = *last_avail_idx & (vrh->vring.num - 1);
+
+ err = getu16(&head, &vrh->vring.avail->ring[i]);
+ if (err) {
+ vringh_bad("Failed to read head: idx %d address %p",
+ *last_avail_idx, &vrh->vring.avail->ring[i]);
+ return err;
+ }
+
+ if (head >= vrh->vring.num) {
+ vringh_bad("Guest says index %u > %u is available",
+ head, vrh->vring.num);
+ return -EINVAL;
+ }
+
+ (*last_avail_idx)++;
+ return head;
+}
+
+/* Copy some bytes to/from the iovec. Returns num copied. */
+static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
+ void *ptr, size_t len,
+ int (*xfer)(void *addr, void *ptr,
+ size_t len))
+{
+ int err, done = 0;
+
+ while (len && iov->i < iov->used) {
+ size_t partlen;
+
+ partlen = min(iov->iov[iov->i].iov_len - iov->off, len);
+ err = xfer(iov->iov[iov->i].iov_base + iov->off, ptr, partlen);
+ if (err)
+ return err;
+ done += partlen;
+ len -= partlen;
+ ptr += partlen;
+ iov->off += partlen;
+
+ if (iov->off == iov->iov[iov->i].iov_len) {
+ iov->off = 0;
+ iov->i++;
+ }
+ }
+ return done;
+}
+
+/* May reduce *len if range is shorter. */
+static inline bool range_check(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *,
+ u64, struct vringh_range *))
+{
+ if (addr < range->start || addr > range->end_incl) {
+ if (!getrange(vrh, addr, range))
+ return false;
+ }
+ BUG_ON(addr < range->start || addr > range->end_incl);
+
+ /* To end of memory? */
+ if (unlikely(addr + *len == 0)) {
+ if (range->end_incl == -1ULL)
+ return true;
+ goto truncate;
+ }
+
+ /* Otherwise, don't wrap. */
+ if (addr + *len < addr) {
+ vringh_bad("Wrapping descriptor %zu@0x%llx",
+ *len, (unsigned long long)addr);
+ return false;
+ }
+
+ if (unlikely(addr + *len - 1 > range->end_incl))
+ goto truncate;
+ return true;
+
+truncate:
+ *len = range->end_incl + 1 - addr;
+ return true;
+}
+
+static inline bool no_range_check(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *,
+ u64, struct vringh_range *))
+{
+ return true;
+}
+
+/* No reason for this code to be inline. */
+static int move_to_indirect(int *up_next, u16 *i, void *addr,
+ const struct vring_desc *desc,
+ struct vring_desc **descs, int *desc_max)
+{
+ /* Indirect tables can't have indirect. */
+ if (*up_next != -1) {
+ vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
+ return -EINVAL;
+ }
+
+ if (unlikely(desc->len % sizeof(struct vring_desc))) {
+ vringh_bad("Strange indirect len %u", desc->len);
+ return -EINVAL;
+ }
+
+ /* We will check this when we follow it! */
+ if (desc->flags & VRING_DESC_F_NEXT)
+ *up_next = desc->next;
+ else
+ *up_next = -2;
+ *descs = addr;
+ *desc_max = desc->len / sizeof(struct vring_desc);
+
+ /* Now, start at the first indirect. */
+ *i = 0;
+ return 0;
+}
+
+static int resize_iovec(struct vringh_kiov *iov, gfp_t gfp)
+{
+ struct kvec *new;
+ unsigned int flag, new_num = (iov->max_num & ~VRINGH_IOV_ALLOCATED) * 2;
+
+ if (new_num < 8)
+ new_num = 8;
+
+ flag = (iov->max_num & VRINGH_IOV_ALLOCATED);
+ if (flag)
+ new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
+ else {
+ new = kmalloc(new_num * sizeof(struct iovec), gfp);
+ if (new) {
+ memcpy(new, iov->iov,
+ iov->max_num * sizeof(struct iovec));
+ flag = VRINGH_IOV_ALLOCATED;
+ }
+ }
+ if (!new)
+ return -ENOMEM;
+ iov->iov = new;
+ iov->max_num = (new_num | flag);
+ return 0;
+}
+
+static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
+ struct vring_desc **descs, int *desc_max)
+{
+ u16 i = *up_next;
+
+ *up_next = -1;
+ *descs = vrh->vring.desc;
+ *desc_max = vrh->vring.num;
+ return i;
+}
+
+static int slow_copy(struct vringh *vrh, void *dst, const void *src,
+ bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *vrh,
+ u64,
+ struct vringh_range *)),
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr,
+ struct vringh_range *r),
+ struct vringh_range *range,
+ int (*copy)(void *dst, const void *src, size_t len))
+{
+ size_t part, len = sizeof(struct vring_desc);
+
+ do {
+ u64 addr;
+ int err;
+
+ part = len;
+ addr = (u64)(unsigned long)src - range->offset;
+
+ if (!rcheck(vrh, addr, &part, range, getrange))
+ return -EINVAL;
+
+ err = copy(dst, src, part);
+ if (err)
+ return err;
+
+ dst += part;
+ src += part;
+ len -= part;
+ } while (len);
+ return 0;
+}
+
+static inline int
+__vringh_iov(struct vringh *vrh, u16 i,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
+ bool (*rcheck)(struct vringh *vrh, u64 addr, size_t *len,
+ struct vringh_range *range,
+ bool (*getrange)(struct vringh *, u64,
+ struct vringh_range *)),
+ bool (*getrange)(struct vringh *, u64, struct vringh_range *),
+ gfp_t gfp,
+ int (*copy)(void *dst, const void *src, size_t len))
+{
+ int err, count = 0, up_next, desc_max;
+ struct vring_desc desc, *descs;
+ struct vringh_range range = { -1ULL, 0 }, slowrange;
+ bool slow = false;
+
+ /* We start traversing vring's descriptor table. */
+ descs = vrh->vring.desc;
+ desc_max = vrh->vring.num;
+ up_next = -1;
+
+ if (riov)
+ riov->i = riov->used = 0;
+ else if (wiov)
+ wiov->i = wiov->used = 0;
+ else
+ /* You must want something! */
+ BUG();
+
+ for (;;) {
+ void *addr;
+ struct vringh_kiov *iov;
+ size_t len;
+
+ if (unlikely(slow))
+ err = slow_copy(vrh, &desc, &descs[i], rcheck, getrange,
+ &slowrange, copy);
+ else
+ err = copy(&desc, &descs[i], sizeof(desc));
+ if (unlikely(err))
+ goto fail;
+
+ if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+ /* Make sure it's OK, and get offset. */
+ len = desc.len;
+ if (!rcheck(vrh, desc.addr, &len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+
+ if (unlikely(len != desc.len)) {
+ slow = true;
+ /* We need to save this range to use offset */
+ slowrange = range;
+ }
+
+ addr = (void *)(long)(desc.addr + range.offset);
+ err = move_to_indirect(&up_next, &i, addr, &desc,
+ &descs, &desc_max);
+ if (err)
+ goto fail;
+ continue;
+ }
+
+ if (count++ == vrh->vring.num) {
+ vringh_bad("Descriptor loop in %p", descs);
+ err = -ELOOP;
+ goto fail;
+ }
+
+ if (desc.flags & VRING_DESC_F_WRITE)
+ iov = wiov;
+ else {
+ iov = riov;
+ if (unlikely(wiov && wiov->i)) {
+ vringh_bad("Readable desc %p after writable",
+ &descs[i]);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ if (!iov) {
+ vringh_bad("Unexpected %s desc",
+ !wiov ? "writable" : "readable");
+ err = -EPROTO;
+ goto fail;
+ }
+
+ again:
+ /* Make sure it's OK, and get offset. */
+ len = desc.len;
+ if (!rcheck(vrh, desc.addr, &len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+ addr = (void *)(unsigned long)(desc.addr + range.offset);
+
+ if (unlikely(iov->used == (iov->max_num & ~VRINGH_IOV_ALLOCATED))) {
+ err = resize_iovec(iov, gfp);
+ if (err)
+ goto fail;
+ }
+
+ iov->iov[iov->used].iov_base = addr;
+ iov->iov[iov->used].iov_len = len;
+ iov->used++;
+
+ if (unlikely(len != desc.len)) {
+ desc.len -= len;
+ desc.addr += len;
+ goto again;
+ }
+
+ if (desc.flags & VRING_DESC_F_NEXT) {
+ i = desc.next;
+ } else {
+ /* Just in case we need to finish traversing above. */
+ if (unlikely(up_next > 0)) {
+ i = return_from_indirect(vrh, &up_next,
+ &descs, &desc_max);
+ slow = false;
+ } else
+ break;
+ }
+
+ if (i >= desc_max) {
+ vringh_bad("Chained index %u > %u", i, desc_max);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ return 0;
+
+fail:
+ return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh,
+ const struct vring_used_elem *used,
+ unsigned int num_used,
+ int (*putu16)(u16 *p, u16 val),
+ int (*putused)(struct vring_used_elem *dst,
+ const struct vring_used_elem
+ *src, unsigned num))
+{
+ struct vring_used *used_ring;
+ int err;
+ u16 used_idx, off;
+
+ used_ring = vrh->vring.used;
+ used_idx = vrh->last_used_idx + vrh->completed;
+
+ off = used_idx % vrh->vring.num;
+
+ /* Compiler knows num_used == 1 sometimes, hence extra check */
+ if (num_used > 1 && unlikely(off + num_used >= vrh->vring.num)) {
+ u16 part = vrh->vring.num - off;
+ err = putused(&used_ring->ring[off], used, part);
+ if (!err)
+ err = putused(&used_ring->ring[0], used + part,
+ num_used - part);
+ } else
+ err = putused(&used_ring->ring[off], used, num_used);
+
+ if (err) {
+ vringh_bad("Failed to write %u used entries %u at %p",
+ num_used, off, &used_ring->ring[off]);
+ return err;
+ }
+
+ /* Make sure buffer is written before we update index. */
+ virtio_wmb(vrh->weak_barriers);
+
+ err = putu16(&vrh->vring.used->idx, used_idx + num_used);
+ if (err) {
+ vringh_bad("Failed to update used index at %p",
+ &vrh->vring.used->idx);
+ return err;
+ }
+
+ vrh->completed += num_used;
+ return 0;
+}
+
+
+static inline int __vringh_need_notify(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p))
+{
+ bool notify;
+ u16 used_event;
+ int err;
+
+ /* Flush out used index update. This is paired with the
+ * barrier that the Guest executes when enabling
+ * interrupts. */
+ virtio_mb(vrh->weak_barriers);
+
+ /* Old-style, without event indices. */
+ if (!vrh->event_indices) {
+ u16 flags;
+ err = getu16(&flags, &vrh->vring.avail->flags);
+ if (err) {
+ vringh_bad("Failed to get flags at %p",
+ &vrh->vring.avail->flags);
+ return err;
+ }
+ return (!(flags & VRING_AVAIL_F_NO_INTERRUPT));
+ }
+
+ /* Modern: we know when other side wants to know. */
+ err = getu16(&used_event, &vring_used_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to get used event idx at %p",
+ &vring_used_event(&vrh->vring));
+ return err;
+ }
+
+ /* Just in case we added so many that we wrap. */
+ if (unlikely(vrh->completed > 0xffff))
+ notify = true;
+ else
+ notify = vring_need_event(used_event,
+ vrh->last_used_idx + vrh->completed,
+ vrh->last_used_idx);
+
+ vrh->last_used_idx += vrh->completed;
+ vrh->completed = 0;
+ return notify;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ int (*putu16)(u16 *p, u16 val))
+{
+ u16 avail;
+
+ /* Already enabled? */
+ if (vrh->listening)
+ return false;
+
+ vrh->listening = true;
+
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, 0) != 0) {
+ vringh_bad("Clearing used flags %p",
+ &vrh->vring.used->flags);
+ return false;
+ }
+ } else {
+ if (putu16(&vring_avail_event(&vrh->vring),
+ vrh->last_avail_idx) != 0) {
+ vringh_bad("Updating avail event index %p",
+ &vring_avail_event(&vrh->vring));
+ return false;
+ }
+ }
+
+ /* They could have slipped one in as we were doing that: make
+ * sure it's written, then check again. */
+ virtio_mb(vrh->weak_barriers);
+
+ if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+ vringh_bad("Failed to check avail idx at %p",
+ &vrh->vring.avail->idx);
+ return false;
+ }
+
+ /* This is so unlikely, we just leave notifications enabled. */
+ return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+ int (*putu16)(u16 *p, u16 val))
+{
+ /* Already disabled? */
+ if (!vrh->listening)
+ return;
+
+ vrh->listening = false;
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+ vringh_bad("Setting used flags %p",
+ &vrh->vring.used->flags);
+ }
+ }
+}
+
+/* Userspace access helpers: in this case, addresses are really userspace. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+ return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+ return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int copydesc_user(void *dst, const void *src, size_t len)
+{
+ return copy_from_user(dst, (__force void __user *)src, len) ?
+ -EFAULT : 0;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+ const struct vring_used_elem *src,
+ unsigned int num)
+{
+ return copy_to_user((__force void __user *)dst, src,
+ sizeof(*dst) * num) ? -EFAULT : 0;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+ return copy_from_user(dst, (__force void __user *)src, len) ?
+ -EFAULT : 0;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+ return copy_to_user((__force void __user *)dst, src, len) ?
+ -EFAULT : 0;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %u", num);
+ return -EINVAL;
+ }
+
+ vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+ vrh->weak_barriers = weak_barriers;
+ vrh->listening = false;
+ vrh->completed = 0;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ /* vring expects kernel addresses, but only used via accessors. */
+ vrh->vring.desc = (__force struct vring_desc *)desc;
+ vrh->vring.avail = (__force struct vring_avail *)avail;
+ vrh->vring.used = (__force struct vring_used *)used;
+ return 0;
+}
+EXPORT_SYMBOL(vringh_init_user);
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @getrange: function to call to check ranges.
+ * @head: head index we received, for passing to vringh_complete_user().
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num. You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_user(struct vringh *vrh,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr, struct vringh_range *r),
+ u16 *head)
+{
+ int err;
+
+ *head = vrh->vring.num;
+ err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+ if (err < 0)
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num)
+ return 0;
+
+ /* We need the layouts to be the indentical for this to work */
+ BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
+ offsetof(struct vringh_iov, iov));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
+ offsetof(struct vringh_iov, i));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, used) !=
+ offsetof(struct vringh_iov, used));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, max_num) !=
+ offsetof(struct vringh_iov, max_num));
+ BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
+ BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
+ offsetof(struct kvec, iov_base));
+ BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
+ offsetof(struct kvec, iov_len));
+ BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
+ != sizeof(((struct kvec *)NULL)->iov_base));
+ BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
+ != sizeof(((struct kvec *)NULL)->iov_len));
+
+ *head = err;
+ err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
+ (struct vringh_kiov *)wiov,
+ range_check, getrange, GFP_KERNEL, copydesc_user);
+ if (err)
+ return err;
+
+ return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_user);
+
+/**
+ * vringh_iov_pull_user - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
+{
+ return vringh_iov_xfer((struct vringh_kiov *)riov,
+ dst, len, xfer_from_user);
+}
+EXPORT_SYMBOL(vringh_iov_pull_user);
+
+/**
+ * vringh_iov_push_user - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ const void *src, size_t len)
+{
+ return vringh_iov_xfer((struct vringh_kiov *)wiov,
+ (void *)src, len, xfer_to_user);
+}
+EXPORT_SYMBOL(vringh_iov_push_user);
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ * vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+ /* We only update vring_avail_event(vr) when we want to be notified,
+ * so we haven't changed that yet. */
+ vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_user);
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_user.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
+ */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len)
+{
+ struct vring_used_elem used;
+
+ used.id = head;
+ used.len = len;
+ return __vringh_complete(vrh, &used, 1, putu16_user, putused_user);
+}
+EXPORT_SYMBOL(vringh_complete_user);
+
+/**
+ * vringh_complete_multi_user - we've finished with many descriptors.
+ * @vrh: the vring.
+ * @used: the head, length pairs.
+ * @num_used: the number of used elements.
+ *
+ * You should check vringh_need_notify_user() after one or more calls
+ * to this function.
+ */
+int vringh_complete_multi_user(struct vringh *vrh,
+ const struct vring_used_elem used[],
+ unsigned num_used)
+{
+ return __vringh_complete(vrh, used, num_used,
+ putu16_user, putused_user);
+}
+EXPORT_SYMBOL(vringh_complete_multi_user);
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+EXPORT_SYMBOL(vringh_notify_enable_user);
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_user);
+}
+EXPORT_SYMBOL(vringh_notify_disable_user);
+
+/**
+ * vringh_need_notify_user - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_user() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_user(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_user);
+}
+EXPORT_SYMBOL(vringh_need_notify_user);
+
+/* Kernelspace access helpers. */
+static inline int getu16_kern(u16 *val, const u16 *p)
+{
+ *val = ACCESS_ONCE(*p);
+ return 0;
+}
+
+static inline int putu16_kern(u16 *p, u16 val)
+{
+ ACCESS_ONCE(*p) = val;
+ return 0;
+}
+
+static inline int copydesc_kern(void *dst, const void *src, size_t len)
+{
+ memcpy(dst, src, len);
+ return 0;
+}
+
+static inline int putused_kern(struct vring_used_elem *dst,
+ const struct vring_used_elem *src,
+ unsigned int num)
+{
+ memcpy(dst, src, num * sizeof(*dst));
+ return 0;
+}
+
+static inline int xfer_kern(void *src, void *dst, size_t len)
+{
+ memcpy(dst, src, len);
+ return 0;
+}
+
+/**
+ * vringh_init_kern - initialize a vringh for a kernelspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid.
+ */
+int vringh_init_kern(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc *desc,
+ struct vring_avail *avail,
+ struct vring_used *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %u", num);
+ return -EINVAL;
+ }
+
+ vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+ vrh->weak_barriers = weak_barriers;
+ vrh->listening = false;
+ vrh->completed = 0;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ vrh->vring.desc = desc;
+ vrh->vring.avail = avail;
+ vrh->vring.used = used;
+ return 0;
+}
+EXPORT_SYMBOL(vringh_init_kern);
+
+/**
+ * vringh_getdesc_kern - get next available descriptor from kernelspace ring.
+ * @vrh: the kernelspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @head: head index we received, for passing to vringh_complete_kern().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num. You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_kern(struct vringh *vrh,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
+ u16 *head,
+ gfp_t gfp)
+{
+ int err;
+
+ err = __vringh_get_head(vrh, getu16_kern, &vrh->last_avail_idx);
+ if (err < 0)
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num)
+ return 0;
+
+ *head = err;
+ err = __vringh_iov(vrh, *head, riov, wiov, no_range_check, NULL,
+ gfp, copydesc_kern);
+ if (err)
+ return err;
+
+ return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_kern);
+
+/**
+ * vringh_iov_pull_kern - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_kern() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len)
+{
+ return vringh_iov_xfer(riov, dst, len, xfer_kern);
+}
+EXPORT_SYMBOL(vringh_iov_pull_kern);
+
+/**
+ * vringh_iov_push_kern - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_kern() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
+ const void *src, size_t len)
+{
+ return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
+}
+EXPORT_SYMBOL(vringh_iov_push_kern);
+
+/**
+ * vringh_abandon_kern - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ * vringh_get_kern() to undo).
+ *
+ * The next vringh_get_kern() will return the old descriptor(s) again.
+ */
+void vringh_abandon_kern(struct vringh *vrh, unsigned int num)
+{
+ /* We only update vring_avail_event(vr) when we want to be notified,
+ * so we haven't changed that yet. */
+ vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_kern);
+
+/**
+ * vringh_complete_kern - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_kern.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_kern() after one or more calls
+ * to this function.
+ */
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len)
+{
+ struct vring_used_elem used;
+
+ used.id = head;
+ used.len = len;
+
+ return __vringh_complete(vrh, &used, 1, putu16_kern, putused_kern);
+}
+EXPORT_SYMBOL(vringh_complete_kern);
+
+/**
+ * vringh_notify_enable_kern - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_kern(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_kern, putu16_kern);
+}
+EXPORT_SYMBOL(vringh_notify_enable_kern);
+
+/**
+ * vringh_notify_disable_kern - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_kern(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_kern);
+}
+EXPORT_SYMBOL(vringh_notify_disable_kern);
+
+/**
+ * vringh_need_notify_kern - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_kern() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_kern(struct vringh *vrh)
+{
+ return __vringh_need_notify(vrh, getu16_kern);
+}
+EXPORT_SYMBOL(vringh_need_notify_kern);
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
new file mode 100644
index 0000000..ab41185
--- /dev/null
+++ b/include/linux/vringh.h
@@ -0,0 +1,185 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VRINGH_H
+#define _LINUX_VRINGH_H
+#include <uapi/linux/virtio_ring.h>
+#include <linux/uio.h>
+#include <linux/slab.h>
+#include <asm/barrier.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+ /* Guest publishes used event idx (note: we always do). */
+ bool event_indices;
+
+ /* Have we told the other end we want to be notified? */
+ bool listening;
+
+ /* Can we get away with weak barriers? */
+ bool weak_barriers;
+
+ /* Last available index we saw (ie. where we're up to). */
+ u16 last_avail_idx;
+
+ /* Last index we used. */
+ u16 last_used_idx;
+
+ /* How many descriptors we've completed since last need_notify(). */
+ u32 completed;
+
+ /* The vring (note: it may contain user pointers!) */
+ struct vring vring;
+};
+
+/* The memory the vring can access, and what offset to apply. */
+struct vringh_range {
+ u64 start, end_incl;
+ u64 offset;
+};
+
+/* All the information about an iovec. */
+struct vringh_iov {
+ struct iovec *iov;
+ size_t off; /* Within iov[i] */
+ unsigned i, used, max_num;
+};
+
+/* All the information about a kvec. */
+struct vringh_kiov {
+ struct kvec *iov;
+ size_t off; /* Within iov[i] */
+ unsigned i, used, max_num;
+};
+
+/* Flag on max_num to indicate we're kmalloced. */
+#define VRINGH_IOV_ALLOCATED 0x8000000
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used);
+
+static inline void vringh_iov_init(struct vringh_iov *iov,
+ struct iovec *iovec, unsigned num)
+{
+ iov->used = iov->i = 0;
+ iov->off = 0;
+ iov->max_num = num;
+ iov->iov = iovec;
+}
+
+static inline void vringh_iov_reset(struct vringh_iov *iov)
+{
+ iov->off = 0;
+ iov->i = 0;
+}
+
+static inline void vringh_iov_cleanup(struct vringh_iov *iov)
+{
+ if (iov->max_num & VRINGH_IOV_ALLOCATED)
+ kfree(iov->iov);
+ iov->max_num = iov->used = iov->i = iov->off = 0;
+ iov->iov = NULL;
+}
+
+/* Convert a descriptor into iovecs. */
+int vringh_getdesc_user(struct vringh *vrh,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr, struct vringh_range *r),
+ u16 *head);
+
+/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ const void *src, size_t len);
+
+/* Mark a descriptor as used. */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len);
+int vringh_complete_multi_user(struct vringh *vrh,
+ const struct vring_used_elem used[],
+ unsigned num_used);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+
+/* Do we need to fire the eventfd to notify the other side? */
+int vringh_need_notify_user(struct vringh *vrh);
+
+bool vringh_notify_enable_user(struct vringh *vrh);
+void vringh_notify_disable_user(struct vringh *vrh);
+
+/* Helpers for kernelspace vrings. */
+int vringh_init_kern(struct vringh *vrh, u32 features,
+ unsigned int num, bool weak_barriers,
+ struct vring_desc *desc,
+ struct vring_avail *avail,
+ struct vring_used *used);
+
+static inline void vringh_kiov_init(struct vringh_kiov *kiov,
+ struct kvec *kvec, unsigned num)
+{
+ kiov->used = kiov->i = 0;
+ kiov->off = 0;
+ kiov->max_num = num;
+ kiov->iov = kvec;
+}
+
+static inline void vringh_kiov_reset(struct vringh_kiov *kiov)
+{
+ kiov->off = 0;
+ kiov->i = 0;
+}
+
+static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov)
+{
+ if (kiov->max_num & VRINGH_IOV_ALLOCATED)
+ kfree(kiov->iov);
+ kiov->max_num = kiov->used = kiov->i = kiov->off = 0;
+ kiov->iov = NULL;
+}
+
+int vringh_getdesc_kern(struct vringh *vrh,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
+ u16 *head,
+ gfp_t gfp);
+
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len);
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
+ const void *src, size_t len);
+void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
+int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len);
+
+bool vringh_notify_enable_kern(struct vringh *vrh);
+void vringh_notify_disable_kern(struct vringh *vrh);
+
+int vringh_need_notify_kern(struct vringh *vrh);
+
+#endif /* _LINUX_VRINGH_H */
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/3] vringh: don't flag already listening.
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
@ 2013-03-06 4:59 ` Rusty Russell
2013-03-06 5:02 ` [PATCH 3/3] tools/virtio: add vring_test (v2) Rusty Russell
1 sibling, 0 replies; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 4:59 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
This doesn't work for eventidx:
1) vringh_notify_enable_user() gets called because the ring is empty
(vrh->last_avail_idx == vrh->vring.avail->idx).
2) It puts vrh->last_avail_idx into vring_avail_event().
3) It sets ->listening to true.
4) Meanwhile, the other side has done more work, updating vrh->vring.avail->idx.
5) It notices vrh->vring.avail->idx has changed, so returns true to
make us do more work.
But since the value we wrote into vring_avail_event() is past, we
won't get notified. This happens rarely: only once I'd optimized
add_buf could I trigger this in about 1 in 10 runs.
Since it's so rare, just remove the listening flag: it's fine to do
this twice anyway.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 8234f2b..d4413d9 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -476,12 +476,6 @@ static inline bool __vringh_notify_enable(struct vringh *vrh,
{
u16 avail;
- /* Already enabled? */
- if (vrh->listening)
- return false;
-
- vrh->listening = true;
-
if (!vrh->event_indices) {
/* Old-school; update flags. */
if (putu16(&vrh->vring.used->flags, 0) != 0) {
@@ -515,11 +509,6 @@ static inline bool __vringh_notify_enable(struct vringh *vrh,
static inline void __vringh_notify_disable(struct vringh *vrh,
int (*putu16)(u16 *p, u16 val))
{
- /* Already disabled? */
- if (!vrh->listening)
- return;
-
- vrh->listening = false;
if (!vrh->event_indices) {
/* Old-school; update flags. */
if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
@@ -593,7 +582,6 @@ int vringh_init_user(struct vringh *vrh, u32 features,
vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
vrh->weak_barriers = weak_barriers;
- vrh->listening = false;
vrh->completed = 0;
vrh->last_avail_idx = 0;
vrh->last_used_idx = 0;
@@ -853,7 +841,6 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
vrh->weak_barriers = weak_barriers;
- vrh->listening = false;
vrh->completed = 0;
vrh->last_avail_idx = 0;
vrh->last_used_idx = 0;
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index ab41185..44b94cd 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -33,9 +33,6 @@ struct vringh {
/* Guest publishes used event idx (note: we always do). */
bool event_indices;
- /* Have we told the other end we want to be notified? */
- bool listening;
-
/* Can we get away with weak barriers? */
bool weak_barriers;
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/3] tools/virtio: add vring_test (v2)
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
2013-03-06 4:59 ` [PATCH 2/3] vringh: don't flag already listening Rusty Russell
@ 2013-03-06 5:02 ` Rusty Russell
1 sibling, 0 replies; 16+ messages in thread
From: Rusty Russell @ 2013-03-06 5:02 UTC (permalink / raw)
To: sjur.brandeland, Ohad Ben-Cohen, mst
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization
This is mainly to test the drivers/vhost/vringh.c code, but it also
uses the drivers/virtio/virtio_ring.c code for the guest side.
Usage for testing the basic implementation:
./vringh_test
# Test with indirect descriptors
./vringh_test --indirect
# Test with indirect descriptors and event indexex
./vringh_test --indirect --eventidx
You can run a parallel stress test by adding --parallel to any of the
above options.
eg ./vringh_test --parallel:
Using CPUS 0 and 3
Guest: notified 10107974, pinged 107970
Host: notified 108158, pinged 3172148
./vringh_test --indirect --eventidx --parallel:
Using CPUS 0 and 3
Guest: notified 156357, pinged 156251
Host: notified 156251, pinged 78179
Average of 50 times doing ./vringh_test --indirect --eventidx --parallel:
2.840000-3.040000(2.927292)user
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changes:
- build objects separately, just like kernel does.
- slightly better cmdline handling, arbitrary ordering of args
- --fast-vringh for benchmarking virtio_ring improvements.
- -U_FORTIFY_SOURCE to remove bogus Ubuntu fortify warnings.
diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile
index b48c432..3187c62 100644
--- a/tools/virtio/Makefile
+++ b/tools/virtio/Makefile
@@ -1,12 +1,14 @@
all: test mod
-test: virtio_test
+test: virtio_test vringh_test
virtio_test: virtio_ring.o virtio_test.o
-CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD
-vpath %.c ../../drivers/virtio
+vringh_test: vringh_test.o vringh.o virtio_ring.o
+
+CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE
+vpath %.c ../../drivers/virtio ../../drivers/vhost
mod:
${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test
.PHONY: all test mod clean
clean:
- ${RM} *.o vhost_test/*.o vhost_test/.*.cmd \
+ ${RM} *.o vringh_test virtio_test vhost_test/*.o vhost_test/.*.cmd \
vhost_test/Module.symvers vhost_test/modules.order *.d
-include *.d
diff --git a/tools/virtio/linux/uio.h b/tools/virtio/linux/uio.h
index ecbdf92..cd20f0b 100644
--- a/tools/virtio/linux/uio.h
+++ b/tools/virtio/linux/uio.h
@@ -1 +1,3 @@
+#include <linux/kernel.h>
+
#include "../../../include/linux/uio.h"
diff --git a/tools/virtio/vringh_test.c b/tools/virtio/vringh_test.c
new file mode 100644
index 0000000..8816dd4
--- /dev/null
+++ b/tools/virtio/vringh_test.c
@@ -0,0 +1,735 @@
+/* Simple test of virtio code, entirely in userpsace. */
+#define _GNU_SOURCE
+#include <sched.h>
+#include <err.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <linux/virtio.h>
+#include <linux/vringh.h>
+#include <linux/virtio_ring.h>
+#include <linux/uaccess.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+
+#define USER_MEM (1024*1024)
+void *__user_addr_min, *__user_addr_max;
+void *__kmalloc_fake, *__kfree_ignore_start, *__kfree_ignore_end;
+static u64 user_addr_offset;
+
+#define RINGSIZE 256
+#define ALIGN 4096
+
+static void never_notify_host(struct virtqueue *vq)
+{
+ abort();
+}
+
+static void never_callback_guest(struct virtqueue *vq)
+{
+ abort();
+}
+
+static bool getrange_iov(struct vringh *vrh, u64 addr, struct vringh_range *r)
+{
+ if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+ return false;
+ if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+ return false;
+
+ r->start = (u64)(unsigned long)__user_addr_min - user_addr_offset;
+ r->end_incl = (u64)(unsigned long)__user_addr_max - 1 - user_addr_offset;
+ r->offset = user_addr_offset;
+ return true;
+}
+
+/* We return single byte ranges. */
+static bool getrange_slow(struct vringh *vrh, u64 addr, struct vringh_range *r)
+{
+ if (addr < (u64)(unsigned long)__user_addr_min - user_addr_offset)
+ return false;
+ if (addr >= (u64)(unsigned long)__user_addr_max - user_addr_offset)
+ return false;
+
+ r->start = addr;
+ r->end_incl = r->start;
+ r->offset = user_addr_offset;
+ return true;
+}
+
+struct guest_virtio_device {
+ struct virtio_device vdev;
+ int to_host_fd;
+ unsigned long notifies;
+};
+
+static void parallel_notify_host(struct virtqueue *vq)
+{
+ struct guest_virtio_device *gvdev;
+
+ gvdev = container_of(vq->vdev, struct guest_virtio_device, vdev);
+ write(gvdev->to_host_fd, "", 1);
+ gvdev->notifies++;
+}
+
+static void no_notify_host(struct virtqueue *vq)
+{
+}
+
+#define NUM_XFERS (10000000)
+
+/* We aim for two "distant" cpus. */
+static void find_cpus(unsigned int *first, unsigned int *last)
+{
+ unsigned int i;
+
+ *first = -1U;
+ *last = 0;
+ for (i = 0; i < 4096; i++) {
+ cpu_set_t set;
+ CPU_ZERO(&set);
+ CPU_SET(i, &set);
+ if (sched_setaffinity(getpid(), sizeof(set), &set) == 0) {
+ if (i < *first)
+ *first = i;
+ if (i > *last)
+ *last = i;
+ }
+ }
+}
+
+/* Opencoded version for fast mode */
+static inline int vringh_get_head(struct vringh *vrh, u16 *head)
+{
+ u16 avail_idx, i;
+ int err;
+
+ err = get_user(avail_idx, &vrh->vring.avail->idx);
+ if (err)
+ return err;
+
+ if (vrh->last_avail_idx == avail_idx)
+ return 0;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ virtio_rmb(vrh->weak_barriers);
+
+ i = vrh->last_avail_idx & (vrh->vring.num - 1);
+
+ err = get_user(*head, &vrh->vring.avail->ring[i]);
+ if (err)
+ return err;
+
+ vrh->last_avail_idx++;
+ return 1;
+}
+
+static int parallel_test(unsigned long features,
+ bool (*getrange)(struct vringh *vrh,
+ u64 addr, struct vringh_range *r),
+ bool fast_vringh)
+{
+ void *host_map, *guest_map;
+ int fd, mapsize, to_guest[2], to_host[2];
+ unsigned long xfers = 0, notifies = 0, receives = 0;
+ unsigned int first_cpu, last_cpu;
+ cpu_set_t cpu_set;
+ char buf[128];
+
+ /* Create real file to mmap. */
+ fd = open("/tmp/vringh_test-file", O_RDWR|O_CREAT|O_TRUNC, 0600);
+ if (fd < 0)
+ err(1, "Opening /tmp/vringh_test-file");
+
+ /* Extra room at the end for some data, and indirects */
+ mapsize = vring_size(RINGSIZE, ALIGN)
+ + RINGSIZE * 2 * sizeof(int)
+ + RINGSIZE * 6 * sizeof(struct vring_desc);
+ mapsize = (mapsize + getpagesize() - 1) & ~(getpagesize() - 1);
+ ftruncate(fd, mapsize);
+
+ /* Parent and child use separate addresses, to check our mapping logic! */
+ host_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+ guest_map = mmap(NULL, mapsize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+
+ pipe(to_guest);
+ pipe(to_host);
+
+ CPU_ZERO(&cpu_set);
+ find_cpus(&first_cpu, &last_cpu);
+ printf("Using CPUS %u and %u\n", first_cpu, last_cpu);
+ fflush(stdout);
+
+ if (fork() != 0) {
+ struct vringh vrh;
+ int status, err, rlen = 0;
+ char rbuf[5];
+
+ /* We are the host: never access guest addresses! */
+ munmap(guest_map, mapsize);
+
+ __user_addr_min = host_map;
+ __user_addr_max = __user_addr_min + mapsize;
+ user_addr_offset = host_map - guest_map;
+ assert(user_addr_offset);
+
+ close(to_guest[0]);
+ close(to_host[1]);
+
+ vring_init(&vrh.vring, RINGSIZE, host_map, ALIGN);
+ vringh_init_user(&vrh, features, RINGSIZE, true,
+ vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+ CPU_SET(first_cpu, &cpu_set);
+ if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+ errx(1, "Could not set affinity to cpu %u", first_cpu);
+
+ while (xfers < NUM_XFERS) {
+ struct iovec host_riov[2], host_wiov[2];
+ struct vringh_iov riov, wiov;
+ u16 head;
+
+ if (fast_vringh) {
+ for (;;) {
+ err = vringh_get_head(&vrh, &head);
+ if (err != 0)
+ break;
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i",
+ err);
+ if (err) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ }
+ }
+ if (err != 1)
+ errx(1, "vringh_get_head");
+ goto complete;
+ } else {
+ vringh_iov_init(&riov,
+ host_riov,
+ ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov,
+ host_wiov,
+ ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov,
+ getrange, &head);
+ }
+ if (err == 0) {
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i",
+ err);
+ if (err) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ }
+
+ if (vringh_notify_enable_user(&vrh))
+ continue;
+
+ /* Swallow all notifies at once. */
+ if (read(to_host[0], buf, sizeof(buf)) < 1)
+ break;
+
+ vringh_notify_disable_user(&vrh);
+ receives++;
+ continue;
+ }
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ /* We simply copy bytes. */
+ if (riov.used) {
+ rlen = vringh_iov_pull_user(&riov, rbuf,
+ sizeof(rbuf));
+ if (rlen != 4)
+ errx(1, "vringh_iov_pull_user: %i",
+ rlen);
+ assert(riov.i == riov.used);
+ } else {
+ err = vringh_iov_push_user(&wiov, rbuf, rlen);
+ if (err != rlen)
+ errx(1, "vringh_iov_push_user: %i",
+ err);
+ assert(wiov.i == wiov.used);
+ }
+ complete:
+ xfers++;
+
+ err = vringh_complete_user(&vrh, head,
+ riov.used ? 0 : rlen);
+ if (err != 0)
+ errx(1, "vringh_complete_user: %i", err);
+ }
+
+ err = vringh_need_notify_user(&vrh);
+ if (err < 0)
+ errx(1, "vringh_need_notify_user: %i", err);
+ if (err) {
+ write(to_guest[1], "", 1);
+ notifies++;
+ }
+ wait(&status);
+ if (!WIFEXITED(status))
+ errx(1, "Child died with signal %i?", WTERMSIG(status));
+ if (WEXITSTATUS(status) != 0)
+ errx(1, "Child exited %i?", WEXITSTATUS(status));
+ printf("Host: notified %lu, pinged %lu\n", notifies, receives);
+ return 0;
+ } else {
+ struct guest_virtio_device gvdev;
+ struct virtqueue *vq;
+ unsigned int *data;
+ struct vring_desc *indirects;
+ unsigned int finished = 0;
+
+ /* We pass sg[]s pointing into here, but we need RINGSIZE+1 */
+ data = guest_map + vring_size(RINGSIZE, ALIGN);
+ indirects = (void *)data + (RINGSIZE + 1) * 2 * sizeof(int);
+
+ /* We are the guest. */
+ munmap(host_map, mapsize);
+
+ close(to_guest[1]);
+ close(to_host[0]);
+
+ gvdev.vdev.features[0] = features;
+ gvdev.to_host_fd = to_host[1];
+
+ CPU_SET(first_cpu, &cpu_set);
+ if (sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set))
+ err(1, "Could not set affinity to cpu %u", first_cpu);
+
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &gvdev.vdev, true,
+ guest_map, fast_vringh ? no_notify_host
+ : parallel_notify_host,
+ never_callback_guest, "guest vq");
+
+ /* Don't kfree indirects. */
+ __kfree_ignore_start = indirects;
+ __kfree_ignore_end = indirects + RINGSIZE * 6;
+
+ while (xfers < NUM_XFERS) {
+ struct scatterlist sg[4];
+ unsigned int num_sg, len;
+ int *dbuf, err;
+ bool output = !(xfers % 2);
+
+ /* Consume bufs. */
+ while ((dbuf = virtqueue_get_buf(vq, &len)) != NULL) {
+ if (len == 4)
+ assert(*dbuf == finished - 1);
+ else
+ assert(*dbuf == finished);
+ finished++;
+ }
+
+ /* Produce a buffer. */
+ dbuf = data + (xfers % (RINGSIZE + 1));
+
+ if (output)
+ *dbuf = xfers;
+ else
+ *dbuf = -1;
+
+ switch ((xfers / sizeof(*dbuf)) % 4) {
+ case 0:
+ /* Nasty three-element sg list. */
+ sg_init_table(sg, num_sg = 3);
+ sg_set_buf(&sg[0], (void *)dbuf, 1);
+ sg_set_buf(&sg[1], (void *)dbuf + 1, 2);
+ sg_set_buf(&sg[2], (void *)dbuf + 3, 1);
+ break;
+ case 1:
+ sg_init_table(sg, num_sg = 2);
+ sg_set_buf(&sg[0], (void *)dbuf, 1);
+ sg_set_buf(&sg[1], (void *)dbuf + 1, 3);
+ break;
+ case 2:
+ sg_init_table(sg, num_sg = 1);
+ sg_set_buf(&sg[0], (void *)dbuf, 4);
+ break;
+ case 3:
+ sg_init_table(sg, num_sg = 4);
+ sg_set_buf(&sg[0], (void *)dbuf, 1);
+ sg_set_buf(&sg[1], (void *)dbuf + 1, 1);
+ sg_set_buf(&sg[2], (void *)dbuf + 2, 1);
+ sg_set_buf(&sg[3], (void *)dbuf + 3, 1);
+ break;
+ }
+
+ /* May allocate an indirect, so force it to allocate
+ * user addr */
+ __kmalloc_fake = indirects + (xfers % RINGSIZE) * 4;
+ if (output)
+ err = virtqueue_add_buf(vq, sg, num_sg, 0, dbuf,
+ GFP_KERNEL);
+ else
+ err = virtqueue_add_buf(vq, sg, 0, num_sg, dbuf,
+ GFP_KERNEL);
+
+ if (err == -ENOSPC) {
+ if (!virtqueue_enable_cb_delayed(vq))
+ continue;
+ /* Swallow all notifies at once. */
+ if (read(to_guest[0], buf, sizeof(buf)) < 1)
+ break;
+
+ receives++;
+ virtqueue_disable_cb(vq);
+ continue;
+ }
+
+ if (err)
+ errx(1, "virtqueue_add_buf: %i", err);
+
+ xfers++;
+ virtqueue_kick(vq);
+ }
+
+ /* Any extra? */
+ while (finished != xfers) {
+ int *dbuf;
+ unsigned int len;
+
+ /* Consume bufs. */
+ dbuf = virtqueue_get_buf(vq, &len);
+ if (dbuf) {
+ if (len == 4)
+ assert(*dbuf == finished - 1);
+ else
+ assert(len == 0);
+ finished++;
+ continue;
+ }
+
+ if (!virtqueue_enable_cb_delayed(vq))
+ continue;
+ if (read(to_guest[0], buf, sizeof(buf)) < 1)
+ break;
+
+ receives++;
+ virtqueue_disable_cb(vq);
+ }
+
+ printf("Guest: notified %lu, pinged %lu\n",
+ gvdev.notifies, receives);
+ return 0;
+ }
+}
+
+int main(int argc, char *argv[])
+{
+ struct virtio_device vdev;
+ struct virtqueue *vq;
+ struct vringh vrh;
+ struct scatterlist guest_sg[RINGSIZE];
+ struct iovec host_riov[2], host_wiov[2];
+ struct vringh_iov riov, wiov;
+ struct vring_used_elem used[RINGSIZE];
+ char buf[28];
+ u16 head;
+ int err;
+ unsigned i;
+ void *ret;
+ bool (*getrange)(struct vringh *vrh, u64 addr, struct vringh_range *r);
+ bool fast_vringh = false, parallel = false;
+
+ getrange = getrange_iov;
+ vdev.features[0] = 0;
+
+ while (argv[1]) {
+ if (strcmp(argv[1], "--indirect") == 0)
+ vdev.features[0] |= (1 << VIRTIO_RING_F_INDIRECT_DESC);
+ else if (strcmp(argv[1], "--eventidx") == 0)
+ vdev.features[0] |= (1 << VIRTIO_RING_F_EVENT_IDX);
+ else if (strcmp(argv[1], "--slow-range") == 0)
+ getrange = getrange_slow;
+ else if (strcmp(argv[1], "--fast-vringh") == 0)
+ fast_vringh = true;
+ else if (strcmp(argv[1], "--parallel") == 0)
+ parallel = true;
+ else
+ errx(1, "Unknown arg %s", argv[1]);
+ argv++;
+ }
+
+ if (parallel)
+ return parallel_test(vdev.features[0], getrange, fast_vringh);
+
+ if (posix_memalign(&__user_addr_min, PAGE_SIZE, USER_MEM) != 0)
+ abort();
+ __user_addr_max = __user_addr_min + USER_MEM;
+ memset(__user_addr_min, 0, vring_size(RINGSIZE, ALIGN));
+
+ /* Set up guest side. */
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
+ __user_addr_min,
+ never_notify_host, never_callback_guest,
+ "guest vq");
+
+ /* Set up host side. */
+ vring_init(&vrh.vring, RINGSIZE, __user_addr_min, ALIGN);
+ vringh_init_user(&vrh, vdev.features[0], RINGSIZE, true,
+ vrh.vring.desc, vrh.vring.avail, vrh.vring.used);
+
+ /* No descriptor to get yet... */
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 0)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ /* Guest puts in a descriptor. */
+ memcpy(__user_addr_max - 1, "a", 1);
+ sg_init_table(guest_sg, 1);
+ sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+ sg_init_table(guest_sg+1, 1);
+ sg_set_buf(&guest_sg[1], __user_addr_max - 3, 2);
+
+ /* May allocate an indirect, so force it to allocate user addr */
+ __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ err = virtqueue_add_buf(vq, guest_sg, 1, 1, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf: %i", err);
+ __kmalloc_fake = NULL;
+
+ /* Host retreives it. */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ assert(riov.used == 1);
+ assert(riov.iov[0].iov_base == __user_addr_max - 1);
+ assert(riov.iov[0].iov_len == 1);
+ if (getrange != getrange_slow) {
+ assert(wiov.used == 1);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 2);
+ } else {
+ assert(wiov.used == 2);
+ assert(wiov.iov[0].iov_base == __user_addr_max - 3);
+ assert(wiov.iov[0].iov_len == 1);
+ assert(wiov.iov[1].iov_base == __user_addr_max - 2);
+ assert(wiov.iov[1].iov_len == 1);
+ }
+
+ err = vringh_iov_pull_user(&riov, buf, 5);
+ if (err != 1)
+ errx(1, "vringh_iov_pull_user: %i", err);
+ assert(buf[0] == 'a');
+ assert(riov.i == 1);
+ assert(vringh_iov_pull_user(&riov, buf, 5) == 0);
+
+ memcpy(buf, "bcdef", 5);
+ err = vringh_iov_push_user(&wiov, buf, 5);
+ if (err != 2)
+ errx(1, "vringh_iov_push_user: %i", err);
+ assert(memcmp(__user_addr_max - 3, "bc", 2) == 0);
+ assert(wiov.i == wiov.used);
+ assert(vringh_iov_push_user(&wiov, buf, 5) == 0);
+
+ /* Host is done. */
+ err = vringh_complete_user(&vrh, head, err);
+ if (err != 0)
+ errx(1, "vringh_complete_user: %i", err);
+
+ /* Guest should see used token now. */
+ __kfree_ignore_start = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ __kfree_ignore_end = __kfree_ignore_start + 1;
+ ret = virtqueue_get_buf(vq, &i);
+ if (ret != &err)
+ errx(1, "virtqueue_get_buf: %p", ret);
+ assert(i == 2);
+
+ /* Guest puts in a huge descriptor. */
+ sg_init_table(guest_sg, RINGSIZE);
+ for (i = 0; i < RINGSIZE; i++) {
+ sg_set_buf(&guest_sg[i],
+ __user_addr_max - USER_MEM/4, USER_MEM/4);
+ }
+
+ /* Fill contents with recognisable garbage. */
+ for (i = 0; i < USER_MEM/4; i++)
+ ((char *)__user_addr_max - USER_MEM/4)[i] = i;
+
+ /* This will allocate an indirect, so force it to allocate user addr */
+ __kmalloc_fake = __user_addr_min + vring_size(RINGSIZE, ALIGN);
+ err = virtqueue_add_buf(vq, guest_sg, RINGSIZE, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (large): %i", err);
+ __kmalloc_fake = NULL;
+
+ /* Host picks it up (allocates new iov). */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ assert(riov.max_num & VRINGH_IOV_ALLOCATED);
+ assert(riov.iov != host_riov);
+ if (getrange != getrange_slow)
+ assert(riov.used == RINGSIZE);
+ else
+ assert(riov.used == RINGSIZE * USER_MEM/4);
+
+ assert(!(wiov.max_num & VRINGH_IOV_ALLOCATED));
+ assert(wiov.used == 0);
+
+ /* Pull data back out (in odd chunks), should be as expected. */
+ for (i = 0; i < RINGSIZE * USER_MEM/4; i += 3) {
+ err = vringh_iov_pull_user(&riov, buf, 3);
+ if (err != 3 && i + err != RINGSIZE * USER_MEM/4)
+ errx(1, "vringh_iov_pull_user large: %i", err);
+ assert(buf[0] == (char)i);
+ assert(err < 2 || buf[1] == (char)(i + 1));
+ assert(err < 3 || buf[2] == (char)(i + 2));
+ }
+ assert(riov.i == riov.used);
+ vringh_iov_cleanup(&riov);
+ vringh_iov_cleanup(&wiov);
+
+ /* Complete using multi interface, just because we can. */
+ used[0].id = head;
+ used[0].len = 0;
+ err = vringh_complete_multi_user(&vrh, used, 1);
+ if (err)
+ errx(1, "vringh_complete_multi_user(1): %i", err);
+
+ /* Free up those descriptors. */
+ ret = virtqueue_get_buf(vq, &i);
+ if (ret != &err)
+ errx(1, "virtqueue_get_buf: %p", ret);
+
+ /* Add lots of descriptors. */
+ sg_init_table(guest_sg, 1);
+ sg_set_buf(&guest_sg[0], __user_addr_max - 1, 1);
+ for (i = 0; i < RINGSIZE; i++) {
+ err = virtqueue_add_buf(vq, guest_sg, 1, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (multiple): %i", err);
+ }
+
+ /* Now get many, and consume them all at once. */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ for (i = 0; i < RINGSIZE; i++) {
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+ used[i].id = head;
+ used[i].len = 0;
+ }
+ /* Make sure it wraps around ring, to test! */
+ assert(vrh.vring.used->idx % RINGSIZE != 0);
+ err = vringh_complete_multi_user(&vrh, used, RINGSIZE);
+ if (err)
+ errx(1, "vringh_complete_multi_user: %i", err);
+
+ /* Free those buffers. */
+ for (i = 0; i < RINGSIZE; i++) {
+ unsigned len;
+ assert(virtqueue_get_buf(vq, &len) != NULL);
+ }
+
+ /* Test weird (but legal!) indirect. */
+ if (vdev.features[0] & (1 << VIRTIO_RING_F_INDIRECT_DESC)) {
+ char *data = __user_addr_max - USER_MEM/4;
+ struct vring_desc *d = __user_addr_max - USER_MEM/2;
+ struct vring vring;
+
+ /* Force creation of direct, which we modify. */
+ vdev.features[0] &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
+ vq = vring_new_virtqueue(0, RINGSIZE, ALIGN, &vdev, true,
+ __user_addr_min,
+ never_notify_host,
+ never_callback_guest,
+ "guest vq");
+
+ sg_init_table(guest_sg, 4);
+ sg_set_buf(&guest_sg[0], d, sizeof(*d)*2);
+ sg_set_buf(&guest_sg[1], d + 2, sizeof(*d)*1);
+ sg_set_buf(&guest_sg[2], data + 6, 4);
+ sg_set_buf(&guest_sg[3], d + 3, sizeof(*d)*3);
+
+ err = virtqueue_add_buf(vq, guest_sg, 4, 0, &err, GFP_KERNEL);
+ if (err)
+ errx(1, "virtqueue_add_buf (indirect): %i", err);
+
+ vring_init(&vring, RINGSIZE, __user_addr_min, ALIGN);
+
+ /* They're used in order, but double-check... */
+ assert(vring.desc[0].addr == (unsigned long)d);
+ assert(vring.desc[1].addr == (unsigned long)(d+2));
+ assert(vring.desc[2].addr == (unsigned long)data + 6);
+ assert(vring.desc[3].addr == (unsigned long)(d+3));
+ vring.desc[0].flags |= VRING_DESC_F_INDIRECT;
+ vring.desc[1].flags |= VRING_DESC_F_INDIRECT;
+ vring.desc[3].flags |= VRING_DESC_F_INDIRECT;
+
+ /* First indirect */
+ d[0].addr = (unsigned long)data;
+ d[0].len = 1;
+ d[0].flags = VRING_DESC_F_NEXT;
+ d[0].next = 1;
+ d[1].addr = (unsigned long)data + 1;
+ d[1].len = 2;
+ d[1].flags = 0;
+
+ /* Second indirect */
+ d[2].addr = (unsigned long)data + 3;
+ d[2].len = 3;
+ d[2].flags = 0;
+
+ /* Third indirect */
+ d[3].addr = (unsigned long)data + 10;
+ d[3].len = 5;
+ d[3].flags = VRING_DESC_F_NEXT;
+ d[3].next = 1;
+ d[4].addr = (unsigned long)data + 15;
+ d[4].len = 6;
+ d[4].flags = VRING_DESC_F_NEXT;
+ d[4].next = 2;
+ d[5].addr = (unsigned long)data + 21;
+ d[5].len = 7;
+ d[5].flags = 0;
+
+ /* Host picks it up (allocates new iov). */
+ vringh_iov_init(&riov, host_riov, ARRAY_SIZE(host_riov));
+ vringh_iov_init(&wiov, host_wiov, ARRAY_SIZE(host_wiov));
+
+ err = vringh_getdesc_user(&vrh, &riov, &wiov, getrange, &head);
+ if (err != 1)
+ errx(1, "vringh_getdesc_user: %i", err);
+
+ if (head != 0)
+ errx(1, "vringh_getdesc_user: head %i not 0", head);
+
+ assert(riov.max_num & VRINGH_IOV_ALLOCATED);
+ if (getrange != getrange_slow)
+ assert(riov.used == 7);
+ else
+ assert(riov.used == 28);
+ err = vringh_iov_pull_user(&riov, buf, 29);
+ assert(err == 28);
+
+ /* Data should be linear. */
+ for (i = 0; i < err; i++)
+ assert(buf[i] == i);
+ vringh_iov_cleanup(&riov);
+ }
+
+ /* Don't leak memory... */
+ vring_del_virtqueue(vq);
+ free(__user_addr_min);
+
+ return 0;
+}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-06 4:54 ` [PATCH 2/2] tools/virtio: make barriers stronger Rusty Russell
@ 2013-03-06 10:20 ` Michael S. Tsirkin
2013-03-07 3:48 ` Rusty Russell
0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2013-03-06 10:20 UTC (permalink / raw)
To: Rusty Russell
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
> In the coming vringh_test, we share an mmap with another userspace process
> for testing. This requires real barriers.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>
> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> index aff61e1..7a63693 100644
> --- a/tools/virtio/asm/barrier.h
> +++ b/tools/virtio/asm/barrier.h
> @@ -3,8 +3,8 @@
> #define mb() __sync_synchronize()
>
> #define smp_mb() mb()
> -# define smp_rmb() barrier()
> -# define smp_wmb() barrier()
> +# define smp_rmb() mb()
> +# define smp_wmb() mb()
> /* Weak barriers should be used. If not - it's a bug */
> # define rmb() abort()
> # define wmb() abort()
Hmm this seems wrong on x86 which has strong order in hardware.
It should not matter whether the other side is a userspace
process or a kernel thread.
--
MST
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 4:42 ` Rusty Russell
@ 2013-03-06 10:50 ` Sjur Brændeland
2013-03-06 12:16 ` Ohad Ben-Cohen
0 siblings, 1 reply; 16+ messages in thread
From: Sjur Brændeland @ 2013-03-06 10:50 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 5:42 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
...
>> @@ -70,6 +80,9 @@ struct virtio_config_ops {
>> void (*finalize_features)(struct virtio_device *vdev);
>> const char *(*bus_name)(struct virtio_device *vdev);
>> int (*set_vq_affinity)(struct virtqueue *vq, int cpu);
>> + int (*find_vrhs)(struct virtio_device *vdev, unsigned nhvrs,
>> + struct vringh *vrhs[], vrh_callback_t *callbacks[]);
>> + void (*del_vrhs)(struct virtio_device *vdev);
>> };
>>
>> /* If driver didn't advertise the feature, it will never appear. */
>
> It's weird that you conflate the host and guest ring sides in rpmsg, but
> that might make sense if they're really bound together.
The caif_virtio driver is using both host and guest ring sides, host side
rings in RX direction and guest side rings in TX direction. The reason
behind this is to enable zero-copy on the producer sides.
(For details see recent discussion with Ohad "Wrappers for vringh"
or the initial blurb when first introducing caif virtio:
https://lkml.org/lkml/2012/10/31/677)
> ... However, in
> general they are not: it's normal to be a guest or host, not both.
>
> This implies that you need a struct vringh_config, to put this in.
OK, should we move the definition of struct vringh_config into vringh.h
then, and add a vringh_config field in struct virtio_device?
>> diff --git a/include/linux/vringh.h b/include/linux/vringh.h
>> index ab41185..8156f51 100644
>> --- a/include/linux/vringh.h
>> +++ b/include/linux/vringh.h
>> @@ -50,6 +50,12 @@ struct vringh {
>>
>> /* The vring (note: it may contain user pointers!) */
>> struct vring vring;
>> +
>> + /* The function to call when buffers are available */
>> + void (*notify)(struct vringh *);
>> +
>> + /* A pointer for the vringh clients to use. */
>> + void *priv;
>> };
>
> Since the caller allocates the vringh, can it not use container_of()
> instead of a priv pointer?
Sure, no problem. I just need to add a new struct in remoteproc containing
both the vringh and a pointer to the internal ring data.
Regards,
Sjur
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 10:50 ` Sjur Brændeland
@ 2013-03-06 12:16 ` Ohad Ben-Cohen
2013-03-06 12:37 ` Sjur Brændeland
0 siblings, 1 reply; 16+ messages in thread
From: Ohad Ben-Cohen @ 2013-03-06 12:16 UTC (permalink / raw)
To: Sjur Brændeland; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland <sjur@brendeland.net> wrote:
> The caif_virtio driver is using both host and guest ring sides, host side
> rings in RX direction and guest side rings in TX direction. The reason
> behind this is to enable zero-copy on the producer sides.
> (For details see recent discussion with Ohad "Wrappers for vringh"
FWIW, I'm not convinced that host side rings are necessary for
zero-copy - you could always just pass pointers to your huge data
inside buffers posted by guest side rings.
That said, I do believe that mixing guest side and host side rings may
simplify the overall solution in some cases, especially in more
complex multi core scenarios (where several cores all communicate with
each other).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 12:16 ` Ohad Ben-Cohen
@ 2013-03-06 12:37 ` Sjur Brændeland
2013-03-06 23:28 ` Sjur Brændeland
0 siblings, 1 reply; 16+ messages in thread
From: Sjur Brændeland @ 2013-03-06 12:37 UTC (permalink / raw)
To: Ohad Ben-Cohen; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 1:16 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
> On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland <sjur@brendeland.net> wrote:
>> The caif_virtio driver is using both host and guest ring sides, host side
>> rings in RX direction and guest side rings in TX direction. The reason
>> behind this is to enable zero-copy on the producer sides.
>> (For details see recent discussion with Ohad "Wrappers for vringh"
>
> FWIW, I'm not convinced that host side rings are necessary for
> zero-copy - you could always just pass pointers to your huge data
> inside buffers posted by guest side rings.
Well, to do that for CAIF you would have to integrate virtio buffer handling
deep into the lower layer of the modems 3GPP stack so that the 3GPP stack
could take buffers from the available ring.
From the outside point of view, perhaps this sounds doable, but in reality you
wouldn't get any Network Signalling engineers implementing the 3GPP stack
to buy into that.
Another issue is that our modem also support other interface technologies
such as USB and HSI. It simply doesn't make sense to enforce virtio rings
internally if virtio isn't otherwise used on the modem.
Without host side rings we simply wouldn't be able to implement zero-copy
in the RX direction for the STE-modem.
Regards,
Sjur
>
> That said, I do believe that mixing guest side and host side rings may
> simplify the overall solution in some cases, especially in more
> complex multi core scenarios (where several cores all communicate with
> each other).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config
2013-03-06 12:37 ` Sjur Brændeland
@ 2013-03-06 23:28 ` Sjur Brændeland
0 siblings, 0 replies; 16+ messages in thread
From: Sjur Brændeland @ 2013-03-06 23:28 UTC (permalink / raw)
To: Ohad Ben-Cohen; +Cc: Linus Walleij, Erwan Yvin, virtualization
On Wed, Mar 6, 2013 at 1:37 PM, Sjur Brændeland <sjurbr@gmail.com> wrote:
> On Wed, Mar 6, 2013 at 1:16 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote:
>> On Wed, Mar 6, 2013 at 12:50 PM, Sjur Brændeland <sjur@brendeland.net> wrote:
>>> The caif_virtio driver is using both host and guest ring sides, host side
>>> rings in RX direction and guest side rings in TX direction. The reason
>>> behind this is to enable zero-copy on the producer sides.
>>> (For details see recent discussion with Ohad "Wrappers for vringh"
>>
>> FWIW, I'm not convinced that host side rings are necessary for
>> zero-copy - you could always just pass pointers to your huge data
>> inside buffers posted by guest side rings.
When I read this again I realized that I missed your point regarding
passing pointers inside buffers completely. My rambling below
argued against using the buffers provided in the available ring as a
replacement of the modem internal allocator. Sorry for the confusion.
I haven't really spent time thinking through the implication of the alternative
of passing buffer pointers inside buffers posted by guest side rings.
But it seems to me that there would be some issues related to returning
consumed buffers back if you use the rings in this way.
Thanks,
Sjur
> Well, to do that for CAIF you would have to integrate virtio buffer handling
> deep into the lower layer of the modems 3GPP stack so that the 3GPP stack
> could take buffers from the available ring.
>
> From the outside point of view, perhaps this sounds doable, but in reality you
> wouldn't get any Network Signalling engineers implementing the 3GPP stack
> to buy into that.
>
> Another issue is that our modem also support other interface technologies
> such as USB and HSI. It simply doesn't make sense to enforce virtio rings
> internally if virtio isn't otherwise used on the modem.
>
> Without host side rings we simply wouldn't be able to implement zero-copy
> in the RX direction for the STE-modem.
>
> Regards,
> Sjur
>
>>
>> That said, I do believe that mixing guest side and host side rings may
>> simplify the overall solution in some cases, especially in more
>> complex multi core scenarios (where several cores all communicate with
>> each other).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-06 10:20 ` Michael S. Tsirkin
@ 2013-03-07 3:48 ` Rusty Russell
2013-03-07 9:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 16+ messages in thread
From: Rusty Russell @ 2013-03-07 3:48 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
>> In the coming vringh_test, we share an mmap with another userspace process
>> for testing. This requires real barriers.
>>
>> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>>
>> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
>> index aff61e1..7a63693 100644
>> --- a/tools/virtio/asm/barrier.h
>> +++ b/tools/virtio/asm/barrier.h
>> @@ -3,8 +3,8 @@
>> #define mb() __sync_synchronize()
>>
>> #define smp_mb() mb()
>> -# define smp_rmb() barrier()
>> -# define smp_wmb() barrier()
>> +# define smp_rmb() mb()
>> +# define smp_wmb() mb()
>> /* Weak barriers should be used. If not - it's a bug */
>> # define rmb() abort()
>> # define wmb() abort()
>
> Hmm this seems wrong on x86 which has strong order in hardware.
> It should not matter whether the other side is a userspace
> process or a kernel thread.
Actually, this code is completely generic now, though overkill for x86 smp_wmb():
Interestingly, when I try defining them, 32-bit x86 slows down (it seems
that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:
On my 32-bit laptop: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz
Before:
Wall time:1.660000-1.790000(1.682500)
After:
Wall time:1.930000-3.620000(1.960625)
64 bit it's a win:
On 2.6.32-358.el6.x86_64, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), Intel(R) Xeon(R) CPU E5620 @ 2.40GHz:
Before:
real 0m2.937000-8.339000(3.123979)s
user 0m2.811000-8.233000(2.954813)s
sys 0m0.052000-0.154000(0.089396)s
After:
real 0m2.559000-2.936000(2.726729)s
user 0m2.397000-2.651000(2.506396)s
sys 0m0.055000-0.152000(0.090667)s
Raw performance doesn't really matter, of course, but it's tempting to
use these asm barriers for __x86_64__, and use __sync_synchronize()
everywhere for everyone else.
Thoughts?
Rusty.
diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
index 7a63693..8de720a 100644
--- a/tools/virtio/asm/barrier.h
+++ b/tools/virtio/asm/barrier.h
@@ -1,11 +1,12 @@
#if defined(__i386__) || defined(__x86_64__)
#define barrier() asm volatile("" ::: "memory")
-#define mb() __sync_synchronize()
-#define smp_mb() mb()
-# define smp_rmb() mb()
-# define smp_wmb() mb()
+#define smp_mb() asm volatile("mfence":::"memory")
+#define smp_rmb() asm volatile("lfence":::"memory")
+#define smp_wmb() asm volatile("sfence" ::: "memory")
+
/* Weak barriers should be used. If not - it's a bug */
+# define mb() abort()
# define rmb() abort()
# define wmb() abort()
#else
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-07 3:48 ` Rusty Russell
@ 2013-03-07 9:29 ` Michael S. Tsirkin
2013-03-07 23:56 ` Rusty Russell
0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2013-03-07 9:29 UTC (permalink / raw)
To: Rusty Russell
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
On Thu, Mar 07, 2013 at 02:48:24PM +1100, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
> >> In the coming vringh_test, we share an mmap with another userspace process
> >> for testing. This requires real barriers.
> >>
> >> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> >>
> >> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> >> index aff61e1..7a63693 100644
> >> --- a/tools/virtio/asm/barrier.h
> >> +++ b/tools/virtio/asm/barrier.h
> >> @@ -3,8 +3,8 @@
> >> #define mb() __sync_synchronize()
> >>
> >> #define smp_mb() mb()
> >> -# define smp_rmb() barrier()
> >> -# define smp_wmb() barrier()
> >> +# define smp_rmb() mb()
> >> +# define smp_wmb() mb()
> >> /* Weak barriers should be used. If not - it's a bug */
> >> # define rmb() abort()
> >> # define wmb() abort()
> >
> > Hmm this seems wrong on x86 which has strong order in hardware.
> > It should not matter whether the other side is a userspace
> > process or a kernel thread.
>
> Actually, this code is completely generic now, though overkill for x86 smp_wmb():
>
> Interestingly, when I try defining them, 32-bit x86 slows down (it seems
> that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:
Well this depends on which arch you are building for.
We saw this in qemu too, see e.g. include/qemu/atomic.h in qemu.
> On my 32-bit laptop: Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz
>
> Before:
> Wall time:1.660000-1.790000(1.682500)
> After:
> Wall time:1.930000-3.620000(1.960625)
>
> 64 bit it's a win:
> On 2.6.32-358.el6.x86_64, gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), Intel(R) Xeon(R) CPU E5620 @ 2.40GHz:
>
> Before:
> real 0m2.937000-8.339000(3.123979)s
> user 0m2.811000-8.233000(2.954813)s
> sys 0m0.052000-0.154000(0.089396)s
> After:
> real 0m2.559000-2.936000(2.726729)s
> user 0m2.397000-2.651000(2.506396)s
> sys 0m0.055000-0.152000(0.090667)s
>
> Raw performance doesn't really matter, of course, but it's tempting to
> use these asm barriers for __x86_64__, and use __sync_synchronize()
> everywhere for everyone else.
>
> Thoughts?
> Rusty.
For smp_mb, I agree.
> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
> index 7a63693..8de720a 100644
> --- a/tools/virtio/asm/barrier.h
> +++ b/tools/virtio/asm/barrier.h
> @@ -1,11 +1,12 @@
> #if defined(__i386__) || defined(__x86_64__)
> #define barrier() asm volatile("" ::: "memory")
> -#define mb() __sync_synchronize()
>
> -#define smp_mb() mb()
> -# define smp_rmb() mb()
> -# define smp_wmb() mb()
> +#define smp_mb() asm volatile("mfence":::"memory")
> +#define smp_rmb() asm volatile("lfence":::"memory")
> +#define smp_wmb() asm volatile("sfence" ::: "memory")
> +
Confused. On x86_64, as long as you are not synchronizing with a device
these are not necessary, a compiler barrier will do, unless
there are non-temporal loads/stores, which we don't use.
> /* Weak barriers should be used. If not - it's a bug */
> +# define mb() abort()
> # define rmb() abort()
> # define wmb() abort()
> #else
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] tools/virtio: make barriers stronger.
2013-03-07 9:29 ` Michael S. Tsirkin
@ 2013-03-07 23:56 ` Rusty Russell
0 siblings, 0 replies; 16+ messages in thread
From: Rusty Russell @ 2013-03-07 23:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: sjur, Linus Walleij, Erwan Yvin, virtualization, sjur.brandeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Mar 07, 2013 at 02:48:24PM +1100, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Wed, Mar 06, 2013 at 03:54:42PM +1100, Rusty Russell wrote:
>> >> In the coming vringh_test, we share an mmap with another userspace process
>> >> for testing. This requires real barriers.
>> >>
>> >> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
>> >>
>> >> diff --git a/tools/virtio/asm/barrier.h b/tools/virtio/asm/barrier.h
>> >> index aff61e1..7a63693 100644
>> >> --- a/tools/virtio/asm/barrier.h
>> >> +++ b/tools/virtio/asm/barrier.h
>> >> @@ -3,8 +3,8 @@
>> >> #define mb() __sync_synchronize()
>> >>
>> >> #define smp_mb() mb()
>> >> -# define smp_rmb() barrier()
>> >> -# define smp_wmb() barrier()
>> >> +# define smp_rmb() mb()
>> >> +# define smp_wmb() mb()
>> >> /* Weak barriers should be used. If not - it's a bug */
>> >> # define rmb() abort()
>> >> # define wmb() abort()
>> >
>> > Hmm this seems wrong on x86 which has strong order in hardware.
>> > It should not matter whether the other side is a userspace
>> > process or a kernel thread.
>>
>> Actually, this code is completely generic now, though overkill for x86 smp_wmb():
>>
>> Interestingly, when I try defining them, 32-bit x86 slows down (it seems
>> that gcc is using "lock orl $0x0,(%esp)" for __sync_synchronize()).:
>
> Well this depends on which arch you are building for.
> We saw this in qemu too, see e.g. include/qemu/atomic.h in qemu.
Hmm, I thought x86 had load reordering, but it seems that was only some
older chips, which we can consider obsolete.
I learned something... I've dropped the patch.
Thanks,
Rusty.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-03-07 23:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-05 13:51 [PATCH vringh] virtio: Introduce vringh wrappers in virtio_config sjur.brandeland
2013-03-06 4:42 ` Rusty Russell
2013-03-06 10:50 ` Sjur Brændeland
2013-03-06 12:16 ` Ohad Ben-Cohen
2013-03-06 12:37 ` Sjur Brændeland
2013-03-06 23:28 ` Sjur Brændeland
2013-03-06 4:46 ` [FYI] vringh fixes Rusty Russell
2013-03-06 4:53 ` [PATCH 1/2] Rusty Russell
2013-03-06 4:54 ` [PATCH 2/2] tools/virtio: make barriers stronger Rusty Russell
2013-03-06 10:20 ` Michael S. Tsirkin
2013-03-07 3:48 ` Rusty Russell
2013-03-07 9:29 ` Michael S. Tsirkin
2013-03-07 23:56 ` Rusty Russell
2013-03-06 4:57 ` [PATCH 1/3] vringh: host-side implementation of virtio rings (v2) Rusty Russell
2013-03-06 4:59 ` [PATCH 2/3] vringh: don't flag already listening Rusty Russell
2013-03-06 5:02 ` [PATCH 3/3] tools/virtio: add vring_test (v2) Rusty Russell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).