[RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
@ 2012-10-31 22:46 Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, sjur, dmitry.tarnyagin

This patch-set introduces the CAIF Virtio Link layer. The purpose is to
communicate with a remote processor (a modem) over shared memory. Virtio
is used as the transport mechanism, and the Remoteproc framework provides
configuration and management of the Virtio rings and devices. The modem
and Linux host may be on the same SoC, or connected over a shared memory
interface such as LLI.

Zero-Copy data transport on the modem is primary goal for CAIF Virtio.
In order to achieve Zero-Copy the direction of the Virtio rings are
flipped in the RX direction. So we have implemented the Virtio
access-function similar to what is found in vhost.c.

The connected LTE-modem is a multi-processor system with an advanced
memory allocator on board. In order to provide zero-copy from the modem
to the connected Linux host, the direction of the Virtio rings are
reversed. This allows the modem to allocate data-buffers in RX
direction and pass them to the Linux host, and recycled buffers to be
sent back to the modem.

The option of providing pre-allocated buffers in RX direction has been
considered, but rejected. The allocation of data buffers happens deep
down in the network signaling stack on the LTE modem before it is known
what type of data is received. It may be data that is handled within the
modem and never sent to the Linux host, or IP traffic going to the host.
Pre-allocated Virtio buffers does not fit for this usage. Another issue
is that the network traffic pattern may vary, resulting in variation of
number and size of buffers allocated from the memory allocator. Dynamic
allocation is needed in order to utilize memory properly. Due to this,
we decided we had to implement "reversed" vrings. Reversed vrings allows
us to minimize the impact on the current memory allocator and buffer
handling on the modem. 

In order to implement reversed rings we have added functions for reading
descriptors from the available-ring and adding descriptors to the used-ring.
The internal data-structures in virtio_ring.c are moved into a new header
file so the data-structures can be accessed by caif_virtio.

The data buffers in TX direction are allocated using dma_alloc_coherent().
This allows memory to be allocated from the memory region shared between
the Host and modem.

In TX direction single linearized TX buffers are added to the vring. In
RX direction linearized frames are also used, but multiple descriptors may
be linked. This is done to allow maximum efficiency for the LTE modem.

This patch set is not yet fully tested and does not handle all negative
scenarios correctly. So at this stage we're primarily looking for review
comments related to the structure of the Virtio code. There are several
options on how to structure this, and feedback is welcomed.

Thanks,
Sjur

Sjur Brændeland (4):
  virtio: Move definitions to header file vring.h
  include/vring.h: Add support for reversed vritio rings.
  virtio_ring: Call callback function even when used ring is empty
  caif_virtio: Add CAIF over virtio

 drivers/net/caif/Kconfig               |    9 +
 drivers/net/caif/Makefile              |    3 +
 drivers/net/caif/caif_virtio.c         |  627 ++++++++++++++++++++++++++++++++
 drivers/remoteproc/remoteproc_virtio.c |    2 +-
 drivers/virtio/virtio_ring.c           |  102 +-----
 drivers/virtio/vring.h                 |  124 +++++++
 include/linux/virtio_ring.h            |    8 +-
 include/uapi/linux/virtio_ids.h        |    1 +
 8 files changed, 776 insertions(+), 100 deletions(-)
 create mode 100644 drivers/net/caif/caif_virtio.c
 create mode 100644 drivers/virtio/vring.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h
  2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings Sjur Brændeland
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, sjur, dmitry.tarnyagin,
	Sjur Brændeland

From: Sjur Brændeland <sjur.brandeland@stericsson.com>

Move the vring_virtqueue structure, memory barrier and debug
macros out from virtio_ring.c to the new header file vring.h.
This is done in order to allow other kernel modules to access the
virtio internal data-structures.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
Tis patch triggers a couple of checkpatch warnings, but I've
chosen to do a clean copy and not do any corrections.

 drivers/virtio/virtio_ring.c |   96 +--------------------------------
 drivers/virtio/vring.h       |  121 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 122 insertions(+), 95 deletions(-)
 create mode 100644 drivers/virtio/vring.h

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..9027af6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -23,101 +23,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/hrtimer.h>
-
-/* virtio guest is communicating with a virtual "device" that actually runs on
- * a host processor.  Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
-/* Where possible, use SMP barriers which are more lightweight than mandatory
- * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio-pci does not use). */
-#define virtio_mb(vq) \
-	do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
-#define virtio_rmb(vq) \
-	do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
-#define virtio_wmb(vq) \
-	do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
-#else
-/* We must force memory ordering even if guest is UP since host could be
- * running on another CPU, but SMP barriers are defined to barrier() in that
- * configuration. So fall back to mandatory barriers instead. */
-#define virtio_mb(vq) mb()
-#define virtio_rmb(vq) rmb()
-#define virtio_wmb(vq) wmb()
-#endif
-
-#ifdef DEBUG
-/* For development, we want to crash whenever the ring is screwed. */
-#define BAD_RING(_vq, fmt, args...)				\
-	do {							\
-		dev_err(&(_vq)->vq.vdev->dev,			\
-			"%s:"fmt, (_vq)->vq.name, ##args);	\
-		BUG();						\
-	} while (0)
-/* Caller is supposed to guarantee no reentry. */
-#define START_USE(_vq)						\
-	do {							\
-		if ((_vq)->in_use)				\
-			panic("%s:in_use = %i\n",		\
-			      (_vq)->vq.name, (_vq)->in_use);	\
-		(_vq)->in_use = __LINE__;			\
-	} while (0)
-#define END_USE(_vq) \
-	do { BUG_ON(!(_vq)->in_use); (_vq)->in_use = 0; } while(0)
-#else
-#define BAD_RING(_vq, fmt, args...)				\
-	do {							\
-		dev_err(&_vq->vq.vdev->dev,			\
-			"%s:"fmt, (_vq)->vq.name, ##args);	\
-		(_vq)->broken = true;				\
-	} while (0)
-#define START_USE(vq)
-#define END_USE(vq)
-#endif
-
-struct vring_virtqueue
-{
-	struct virtqueue vq;
-
-	/* Actual memory layout for this queue */
-	struct vring vring;
-
-	/* Can we use weak barriers? */
-	bool weak_barriers;
-
-	/* Other side has made a mess, don't try any more. */
-	bool broken;
-
-	/* Host supports indirect buffers */
-	bool indirect;
-
-	/* Host publishes avail event idx */
-	bool event;
-
-	/* Head of free buffer list. */
-	unsigned int free_head;
-	/* Number we've added since last sync. */
-	unsigned int num_added;
-
-	/* Last used index we've seen. */
-	u16 last_used_idx;
-
-	/* How to notify other side. FIXME: commonalize hcalls! */
-	void (*notify)(struct virtqueue *vq);
-
-#ifdef DEBUG
-	/* They're supposed to lock for us. */
-	unsigned int in_use;
-
-	/* Figure out if their kicks are too delayed. */
-	bool last_add_time_valid;
-	ktime_t last_add_time;
-#endif
-
-	/* Tokens for callbacks. */
-	void *data[];
-};
-
-#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+#include "vring.h"
 
 /* Set up an indirect table of descriptors and add it to the queue. */
 static int vring_add_indirect(struct vring_virtqueue *vq,
diff --git a/drivers/virtio/vring.h b/drivers/virtio/vring.h
new file mode 100644
index 0000000..b997fc3
--- /dev/null
+++ b/drivers/virtio/vring.h
@@ -0,0 +1,121 @@
+/* Virtio ring implementation.
+ *
+ *  Copyright 2007 Rusty Russell IBM Corporation
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef _LINUX_VIRTIO_RING_H_
+#define _LINUX_VIRTIO_RING_H_
+
+#include <linux/virtio_ring.h>
+#include <linux/virtio.h>
+
+struct vring_virtqueue
+{
+	struct virtqueue vq;
+
+	/* Actual memory layout for this queue */
+	struct vring vring;
+
+	/* Can we use weak barriers? */
+	bool weak_barriers;
+
+	/* Other side has made a mess, don't try any more. */
+	bool broken;
+
+	/* Host supports indirect buffers */
+	bool indirect;
+
+	/* Host publishes avail event idx */
+	bool event;
+
+	/* Number of free buffers */
+	unsigned int num_free;
+	/* Head of free buffer list. */
+	unsigned int free_head;
+	/* Number we've added since last sync. */
+	unsigned int num_added;
+
+	/* Last used index we've seen. */
+	u16 last_used_idx;
+
+	/* How to notify other side. FIXME: commonalize hcalls! */
+	void (*notify)(struct virtqueue *vq);
+
+#ifdef DEBUG
+	/* They're supposed to lock for us. */
+	unsigned int in_use;
+
+	/* Figure out if their kicks are too delayed. */
+	bool last_add_time_valid;
+	ktime_t last_add_time;
+#endif
+
+	/* Tokens for callbacks. */
+	void *data[];
+};
+#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+
+/* virtio guest is communicating with a virtual "device" that actually runs on
+ * a host processor.  Memory barriers are used to control SMP effects. */
+#ifdef CONFIG_SMP
+/* Where possible, use SMP barriers which are more lightweight than mandatory
+ * barriers, because mandatory barriers control MMIO effects on accesses
+ * through relaxed memory I/O windows (which virtio-pci does not use). */
+#define virtio_mb(vq) \
+	do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
+#define virtio_rmb(vq) \
+	do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
+#define virtio_wmb(vq) \
+	do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
+#else
+/* We must force memory ordering even if guest is UP since host could be
+ * running on another CPU, but SMP barriers are defined to barrier() in that
+ * configuration. So fall back to mandatory barriers instead. */
+#define virtio_mb(vq) mb()
+#define virtio_rmb(vq) rmb()
+#define virtio_wmb(vq) wmb()
+#endif
+
+#ifdef DEBUG
+/* For development, we want to crash whenever the ring is screwed. */
+#define BAD_RING(_vq, fmt, args...)				\
+	do {							\
+		dev_err(&(_vq)->vq.vdev->dev,			\
+			"%s:"fmt, (_vq)->vq.name, ##args);	\
+		BUG();						\
+	} while (0)
+/* Caller is supposed to guarantee no reentry. */
+#define START_USE(_vq)						\
+	do {							\
+		if ((_vq)->in_use)				\
+			panic("%s:in_use = %i\n",		\
+			      (_vq)->vq.name, (_vq)->in_use);	\
+		(_vq)->in_use = __LINE__;			\
+	} while (0)
+#define END_USE(_vq) \
+	do { BUG_ON(!(_vq)->in_use); (_vq)->in_use = 0; } while(0)
+#else
+#define BAD_RING(_vq, fmt, args...)				\
+	do {							\
+		dev_err(&_vq->vq.vdev->dev,			\
+			"%s:"fmt, (_vq)->vq.name, ##args);	\
+		(_vq)->broken = true;				\
+	} while (0)
+#define START_USE(vq)
+#define END_USE(vq)
+#endif
+
+#endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings.
  2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty Sjur Brændeland
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, sjur, dmitry.tarnyagin,
	Sjur Brændeland

From: Sjur Brændeland <sjur.brandeland@stericsson.com>

Add last avilable index to the vring_virtqueue structure,
this is done to prepare for implementation of the reversed vring.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 drivers/virtio/vring.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/vring.h b/drivers/virtio/vring.h
index b997fc3..3b53961 100644
--- a/drivers/virtio/vring.h
+++ b/drivers/virtio/vring.h
@@ -51,6 +51,9 @@ struct vring_virtqueue
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* Last avail index seen. NOTE: Only used for reversed rings.*/
+	u16 last_avail_idx;
+
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	void (*notify)(struct virtqueue *vq);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty
  2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
  2012-10-31 22:46 ` [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio Sjur Brændeland
  2012-11-01  7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
  4 siblings, 0 replies; 23+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, sjur, dmitry.tarnyagin,
	Sjur Brændeland

From: Sjur Brændeland <sjur.brandeland@stericsson.com>

Enable option to force call of callback function even if
used ring is empty. This is needed for reversed vring.
Add a helper function  __vring_interrupt and add extra
boolean argument for forcing callback when interrupt is called.
The original vring_interrupt semantic and signature is
perserved.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 drivers/remoteproc/remoteproc_virtio.c |    2 +-
 drivers/virtio/virtio_ring.c           |    6 +++---
 include/linux/virtio_ring.h            |    8 +++++++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index e7a4780..ddde863 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -63,7 +63,7 @@ irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int notifyid)
 	if (!rvring || !rvring->vq)
 		return IRQ_NONE;
 
-	return vring_interrupt(0, rvring->vq);
+	return __vring_interrupt(0, rvring->vq, true);
 }
 EXPORT_SYMBOL(rproc_vq_interrupt);
 
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9027af6..af85034 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -504,11 +504,11 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
-irqreturn_t vring_interrupt(int irq, void *_vq)
+irqreturn_t __vring_interrupt(int irq, void *_vq, bool force)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	if (!more_used(vq)) {
+	if (!force && !more_used(vq)) {
 		pr_debug("virtqueue interrupt with no work for %p\n", vq);
 		return IRQ_NONE;
 	}
@@ -522,7 +522,7 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 
 	return IRQ_HANDLED;
 }
-EXPORT_SYMBOL_GPL(vring_interrupt);
+EXPORT_SYMBOL_GPL(__vring_interrupt);
 
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 63c6ea1..ccb7915 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -20,5 +20,11 @@ void vring_del_virtqueue(struct virtqueue *vq);
 /* Filter out transport-specific feature bits. */
 void vring_transport_features(struct virtio_device *vdev);
 
-irqreturn_t vring_interrupt(int irq, void *_vq);
+irqreturn_t __vring_interrupt(int irq, void *_vq, bool force);
+
+static inline irqreturn_t vring_interrupt(int irq, void *_vq)
+{
+	return __vring_interrupt(irq, _vq, false);
+}
+
 #endif /* _LINUX_VIRTIO_RING_H */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio
  2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
                   ` (2 preceding siblings ...)
  2012-10-31 22:46 ` [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
  2012-11-01  7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
  4 siblings, 0 replies; 23+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, sjur, dmitry.tarnyagin,
	Sjur Brændeland, Vikram ARV

From: Sjur Brændeland <sjur.brandeland@stericsson.com>

Add the CAIF Virtio Link layer, used for communicating with a
modem over shared memory. Virtio is used as the transport mechanism.

In the TX direction the virtio rings are used in the normal fashion,
sending data in the available ring. But in the rx direction the
the we have flipped the direction of the virtio ring, and
implemented the virtio access-function similar to what is found
in vhost.c.

CAIF also uses the virtio configuration space for getting
configuration parameters such as headroom, tailroom etc.

Signed-off-by: Vikram ARV <vikram.arv@stericsson.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
 drivers/net/caif/Kconfig        |    9 +
 drivers/net/caif/Makefile       |    3 +
 drivers/net/caif/caif_virtio.c  |  627 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_ids.h |    1 +
 4 files changed, 640 insertions(+)
 create mode 100644 drivers/net/caif/caif_virtio.c

diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index abf4d7a..a01f617 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -47,3 +47,12 @@ config CAIF_HSI
        The caif low level driver for CAIF over HSI.
        Be aware that if you enable this then you also need to
        enable a low-level HSI driver.
+
+config CAIF_VIRTIO
+       tristate "CAIF virtio transport driver"
+       default n
+       depends on CAIF
+       depends on REMOTEPROC
+       select VIRTIO
+       ---help---
+       The caif driver for CAIF over Virtio.
diff --git a/drivers/net/caif/Makefile b/drivers/net/caif/Makefile
index 91dff86..d9ee26a 100644
--- a/drivers/net/caif/Makefile
+++ b/drivers/net/caif/Makefile
@@ -13,3 +13,6 @@ obj-$(CONFIG_CAIF_SHM) += caif_shm.o
 
 # HSI interface
 obj-$(CONFIG_CAIF_HSI) += caif_hsi.o
+
+# Virtio interface
+obj-$(CONFIG_CAIF_VIRTIO) += caif_virtio.o
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
new file mode 100644
index 0000000..e50940f
--- /dev/null
+++ b/drivers/net/caif/caif_virtio.c
@@ -0,0 +1,627 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2012
+ * Contact: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Authors: Vicram Arv / vikram.arv@stericsson.com,
+ *	    Dmitry Tarnyagin / dmitry.tarnyagin@stericsson.com
+ *	    Sjur Brendeland / sjur.brandeland@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":" fmt
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/dma-mapping.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+#include <linux/spinlock.h>
+#include <net/caif/caif_dev.h>
+#include <linux/virtio_caif.h>
+#include "../drivers/virtio/vring.h"
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Vicram Arv <vikram.arv@stericsson.com>");
+MODULE_DESCRIPTION("Virtio CAIF Driver");
+
+/*
+ * struct cfv_info - Caif Virtio control structure
+ * @cfdev:	caif common header
+ * @vdev:	Associated virtio device
+ * @vq_rx:	rx/downlink virtqueue
+ * @vq_tx:	tx/uplink virtqueue
+ * @ndev:	associated netdevice
+ * @queued_tx:	number of buffers queued in the tx virtqueue
+ * @watermark_tx: indicates number of buffers the tx queue
+ *		should shrink to to unblock datapath
+ * @tx_lock:	protects vq_tx to allow concurrent senders
+ * @tx_hr:	transmit headroom
+ * @rx_hr:	receive headroom
+ * @tx_tr:	transmit tailroom
+ * @rx_tr:	receive tailroom
+ * @mtu:	transmit max size
+ * @mru:	receive max size
+ */
+struct cfv_info {
+	struct caif_dev_common cfdev;
+	struct virtio_device *vdev;
+	struct virtqueue *vq_rx;
+	struct virtqueue *vq_tx;
+	struct net_device *ndev;
+	unsigned int queued_tx;
+	unsigned int watermark_tx;
+	/* Protect access to vq_tx */
+	spinlock_t tx_lock;
+	/* Copied from Virtio config space */
+	u16 tx_hr;
+	u16 rx_hr;
+	u16 tx_tr;
+	u16 rx_tr;
+	u32 mtu;
+	u32 mru;
+};
+
+/*
+ * struct token_info - maintains Transmit buffer data handle
+ * @size:	size of transmit buffer
+ * @dma_handle: handle to allocated dma device memory area
+ * @vaddr:	virtual address mapping to allocated memory area
+ */
+struct token_info {
+	size_t size;
+	u8 *vaddr;
+	dma_addr_t dma_handle;
+};
+
+/* Default if virtio config space is unavailable */
+#define CFV_DEF_MTU_SIZE 4096
+#define CFV_DEF_HEADROOM 16
+#define CFV_DEF_TAILROOM 16
+
+/* Require IP header to be 4-byte aligned. */
+#define IP_HDR_ALIGN 4
+
+/*
+ * virtqueue_next_avail_desc - get the next available descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor in the ring
+ *
+ * Look for the next available descriptor in the available ring.
+ * Return NULL if nothing new in the available.
+ */
+static struct vring_desc *virtqueue_next_avail_desc(struct virtqueue *_vq,
+						    int *head)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u16 avail_idx, hd, last_avail_idx = vq->last_avail_idx;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken))
+		goto err;
+
+	/* The barier secures observability of avail->idx after interrupt */
+	virtio_rmb(vq);
+
+	if (vq->last_avail_idx == vq->vring.avail->idx)
+		goto err;
+
+	avail_idx = vq->vring.avail->idx;
+	if (unlikely((u16)(avail_idx - last_avail_idx) > vq->vring.num)) {
+		BAD_RING(vq, "Avail index moved from %u to %u",
+			 last_avail_idx, avail_idx);
+		goto err;
+	}
+
+	/*
+	 * The barier secures observability of the ring content
+	 * after avail->idx update
+	 */
+	virtio_rmb(vq);
+
+	hd =  vq->vring.avail->ring[last_avail_idx & (vq->vring.num - 1)];
+	/* If their number is silly, that's an error. */
+	if (unlikely(hd >= vq->vring.num)) {
+		BAD_RING(vq, "Remote says index %d > %u is available",
+			 *head, vq->vring.num);
+		goto err;
+	}
+
+	END_USE(vq);
+	*head = hd;
+	return &vq->vring.desc[hd];
+err:
+	END_USE(vq);
+	*head = -1;
+	return NULL;
+}
+
+/*
+ * virtqueue_next_linked_desc - get next linked descriptor from the ring
+ * @_vq: the struct virtqueue we're talking about
+ * @desc: "current" descriptor
+ *
+ * Each buffer in the virtqueues is a chain of descriptors. This
+ * function returns the next descriptor in the chain,* or NULL if we're at
+ * the end.
+ *
+ * Side effect: the function increments vq->last_avail_idx if a non-linked
+ * descriptor is passed as &desc argument.
+ */
+static struct vring_desc *virtqueue_next_linked_desc(struct virtqueue *_vq,
+						     struct vring_desc *desc)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	unsigned int next;
+
+	START_USE(vq);
+
+	/* If this descriptor says it doesn't chain, we're done. */
+	if (!(desc->flags & VRING_DESC_F_NEXT))
+		goto no_next;
+
+	next = desc->next;
+	/* Make sure compiler knows to grab that: we don't want it changing! */
+	/* We will use the result as an index in an array, so most
+	 * architectures only need a compiler barrier here.
+	 */
+	read_barrier_depends();
+
+	if (unlikely(next >= vq->vring.num)) {
+		BAD_RING(vq, "Desc index is %u > %u\n", next, vq->vring.num);
+		goto err;
+	}
+
+	desc = &vq->vring.desc[next];
+
+	if (desc->flags & VRING_DESC_F_INDIRECT) {
+		pr_err("Indirect descriptor not supported\n");
+		goto err;
+	}
+
+	END_USE(vq);
+	return desc;
+no_next:
+	vq->last_avail_idx++;
+err:
+	END_USE(vq);
+	return NULL;
+}
+
+/*
+ * virtqueue_add_buf_to_used - release a used descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor to be released
+ * @len: number of linked descriptors in a chain
+ *
+ * The function releases a used descriptor in a reversed ring
+ */
+static int virtqueue_add_buf_to_used(struct virtqueue *_vq,
+				     unsigned int head, int len)
+{
+	struct vring_virtqueue *vr_vq = to_vvq(_vq);
+	struct vring_used_elem	*used;
+	int used_idx, err = -EINVAL;
+
+	START_USE(vr_vq);
+
+	if (unlikely(vr_vq->broken))
+		goto err;
+
+	if (unlikely(head >= vr_vq->vring.num)) {
+		BAD_RING(vr_vq, "Invalid head index (%u) > max desc idx (%u) ",
+			 head, vr_vq->vring.num - 1);
+		goto err;
+	}
+
+	/*
+	 * The virtqueue contains a ring of used buffers.  Get a pointer to the
+	 * next entry in that used ring.
+	 */
+	used_idx = (vr_vq->vring.used->idx & (vr_vq->vring.num - 1));
+	used = &vr_vq->vring.used->ring[used_idx];
+	used->id = head;
+	used->len = len;
+
+	/* Make sure buffer is written before we update index. */
+	virtio_wmb(vr_vq);
+	++vr_vq->vring.used->idx;
+	err = 0;
+err:
+	END_USE(vr_vq);
+	return err;
+
+}
+
+/*
+ * virtqueue_next_desc - get next available or linked descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @desc: "current" descriptor.
+ * @head: on return it is filled by the descriptor index in case of
+ *	available descriptor was returned, or -1 in case of linked
+ *	descriptor.
+ *
+ * The function is to be used as an iterator through received descriptors.
+ */
+static struct vring_desc *virtqueue_next_desc(struct virtqueue *_vq,
+					      struct vring_desc *desc,
+					      int *head)
+{
+	struct vring_desc *next = virtqueue_next_linked_desc(_vq, desc);
+
+	if (next == NULL) {
+		virtqueue_add_buf_to_used(_vq, *head, 0);
+		/* tell the remote processor to recycle buffer */
+		virtqueue_kick(_vq);
+		next = virtqueue_next_avail_desc(_vq, head);
+	} 
+	return next;
+}
+
+/*
+ * This is invoked whenever the remote processor completed processing
+ * a TX msg we just sent it, and the buffer is put back to the used ring.
+ */
+static void cfv_release_used_buf(struct virtqueue *vq_tx)
+{
+	struct cfv_info *cfv = vq_tx->vdev->priv;
+
+	BUG_ON(vq_tx != cfv->vq_tx);
+
+	for (;;) {
+		unsigned int len;
+		struct token_info *buf_info;
+
+		/* Get used buffer from used ring to recycle used descriptors */
+		spin_lock_bh(&cfv->tx_lock);
+		buf_info = virtqueue_get_buf(vq_tx, &len);
+
+		if (!buf_info)
+			goto out;
+
+		BUG_ON(!cfv->queued_tx);
+		if (--cfv->queued_tx < cfv->watermark_tx) {
+			cfv->watermark_tx = 0;
+			netif_tx_wake_all_queues(cfv->ndev);
+		}
+		spin_unlock_bh(&cfv->tx_lock);
+
+		dma_free_coherent(vq_tx->vdev->dev.parent->parent,
+				  buf_info->size, buf_info->vaddr,
+				  buf_info->dma_handle);
+		kfree(buf_info);
+	}
+	return;
+out:
+	spin_unlock_bh(&cfv->tx_lock);
+}
+
+static int cfv_read_desc(struct vring_desc *d,
+			 void **buf, size_t *size)
+{
+	if (d->flags & VRING_DESC_F_INDIRECT) {
+		pr_warn("Indirect descriptor not supported by CAIF\n");
+		return -EINVAL;
+	}
+
+	if (!(d->flags & VRING_DESC_F_WRITE)) {
+		pr_warn("Write descriptor not supported by CAIF\n");
+		/* CAIF expects a input descriptor here */
+		return -EINVAL;
+	}
+	*buf = phys_to_virt(d->addr);
+	*size = d->len;
+	return 0;
+}
+
+static struct sk_buff *cfv_alloc_and_copy_skb(struct cfv_info *cfv,
+					      u8 *frm, u32 frm_len)
+{
+	struct sk_buff *skb;
+	u32 cfpkt_len, pad_len;
+
+	/* Verify that packet size with down-link header and mtu size */
+	if (frm_len > cfv->mru || frm_len <= cfv->rx_hr + cfv->rx_tr) {
+		netdev_err(cfv->ndev,
+			   "Invalid frmlen:%u  mtu:%u hr:%d tr:%d\n",
+			   frm_len, cfv->mru,  cfv->rx_hr,
+			   cfv->rx_tr);
+		return NULL;
+	}
+
+	cfpkt_len = frm_len - (cfv->rx_hr + cfv->rx_tr);
+
+	pad_len = (unsigned long)(frm + cfv->rx_hr) & (IP_HDR_ALIGN - 1);
+
+	skb = netdev_alloc_skb(cfv->ndev, frm_len + pad_len);
+	if (!skb)
+		return NULL;
+
+	/* Reserve space for headers. */
+	skb_reserve(skb, cfv->rx_hr + pad_len);
+
+	memcpy(skb_put(skb, cfpkt_len), frm + cfv->rx_hr, cfpkt_len);
+	return skb;
+}
+
+/*
+ * This is invoked whenever the remote processor has sent down-link data
+ * on the Rx VQ avail ring and it's time to digest a message.
+ *
+ * CAIF virtio passes a complete CAIF frame including head/tail room
+ * in each linked descriptor. So iterate over all available buffers
+ * in available-ring and the associated linked descriptors.
+ */
+static void cfv_recv(struct virtqueue *vq_rx)
+{
+	struct cfv_info *cfv = vq_rx->vdev->priv;
+	struct vring_desc *desc;
+	struct sk_buff *skb;
+	int head = -1;
+	void *buf;
+	size_t len;
+
+	for (desc = virtqueue_next_avail_desc(vq_rx, &head);
+	     desc != NULL && !cfv_read_desc(desc, &buf, &len);
+	     desc = virtqueue_next_desc(vq_rx, desc, &head)) {
+
+		skb = cfv_alloc_and_copy_skb(cfv, buf, len);
+
+		if (!skb)
+			goto err;
+
+		skb->protocol = htons(ETH_P_CAIF);
+		skb_reset_mac_header(skb);
+		skb->dev = cfv->ndev;
+
+		/* Push received packet up the stack. */
+		if (netif_receive_skb(skb))
+			goto err;
+
+		++cfv->ndev->stats.rx_packets;
+		cfv->ndev->stats.rx_bytes += skb->len;
+	}
+	return;
+err:
+	++cfv->ndev->stats.rx_dropped;
+	return;
+}
+
+static int cfv_netdev_open(struct net_device *netdev)
+{
+	netif_carrier_on(netdev);
+	return 0;
+}
+
+static int cfv_netdev_close(struct net_device *netdev)
+{
+	netif_carrier_off(netdev);
+	return 0;
+}
+
+static struct token_info *cfv_alloc_and_copy_to_dmabuf(struct cfv_info *cfv,
+						       struct sk_buff *skb,
+						       struct scatterlist *sg)
+{
+	struct caif_payload_info *info = (void *)&skb->cb;
+	struct token_info *buf_info = NULL;
+	u8 pad_len, hdr_ofs;
+
+	if (unlikely(cfv->tx_hr + skb->len + cfv->tx_tr > cfv->mtu)) {
+		netdev_warn(cfv->ndev, "Invalid packet len (%d > %d)\n",
+			    cfv->tx_hr + skb->len + cfv->tx_tr, cfv->mtu);
+		goto err;
+	}
+
+	buf_info = kmalloc(sizeof(struct token_info), GFP_ATOMIC);
+	if (unlikely(!buf_info))
+		goto err;
+
+	/* Make the IP header aligned in tbe buffer */
+	hdr_ofs = cfv->tx_hr + info->hdr_len;
+	pad_len = hdr_ofs & (IP_HDR_ALIGN - 1);
+	buf_info->size = cfv->tx_hr + skb->len + cfv->tx_tr + pad_len;
+
+	if (WARN_ON_ONCE(cfv->vdev->dev.parent))
+		goto err;
+
+	/* allocate coherent memory for the buffers */
+	buf_info->vaddr =
+		dma_alloc_coherent(cfv->vdev->dev.parent->parent,
+				   buf_info->size, &buf_info->dma_handle,
+				   GFP_ATOMIC);
+	if (unlikely(!buf_info->vaddr)) {
+		netdev_warn(cfv->ndev,
+			    "Out of DMA memory (alloc %zu bytes)\n",
+			    buf_info->size);
+		goto err;
+	}
+
+	/* copy skbuf contents to send buffer */
+	skb_copy_bits(skb, 0, buf_info->vaddr + cfv->tx_hr + pad_len, skb->len);
+	sg_init_one(sg, buf_info->vaddr + pad_len,
+		    skb->len + cfv->tx_hr + cfv->rx_hr);
+	return buf_info;
+err:
+	kfree(buf_info);
+	return NULL;
+}
+
+/*
+ * This is invoked whenever the host processor application has sent up-link data.
+ * Send it in the TX VQ avail ring.
+ *
+ * CAIF Virtio sends does not use linked descriptors in the tx direction.
+ */
+static int cfv_netdev_tx(struct sk_buff *skb, struct net_device *netdev)
+{
+	struct cfv_info *cfv = netdev_priv(netdev);
+	struct token_info *buf_info;
+	struct scatterlist sg;
+	bool flow_off = false;
+
+	buf_info = cfv_alloc_and_copy_to_dmabuf(cfv, skb, &sg);
+	spin_lock_bh(&cfv->tx_lock);
+
+	/*
+	 * Add buffer to avail ring.
+	 * Note: in spite of a space check at beginning of the function,
+	 * the add_buff call might fail in case of concurrent access on smp
+	 * systems.
+	 */
+	if (WARN_ON(virtqueue_add_buf(cfv->vq_tx, &sg, 0, 1,
+				      buf_info, GFP_ATOMIC) < 0)) {
+		/* It should not happen */
+		++cfv->ndev->stats.tx_dropped;
+		flow_off = true;
+	} else {
+		/* update netdev statistics */
+		cfv->queued_tx++;
+		cfv->ndev->stats.tx_packets++;
+		cfv->ndev->stats.tx_bytes += skb->len;
+	}
+
+	/*
+	 * Flow-off check takes into account number of cpus to make sure
+	 * virtqueue will not be overfilled in any possible smp conditions.
+	 */
+	flow_off = cfv->queued_tx + num_present_cpus() >=
+		virtqueue_get_vring_size(cfv->vq_tx);
+
+	/* tell the remote processor it has a pending message to read */
+	virtqueue_kick(cfv->vq_tx);
+
+	if (flow_off) {
+		cfv->watermark_tx = cfv->queued_tx >> 1;
+		netif_tx_stop_all_queues(netdev);
+	}
+
+	spin_unlock_bh(&cfv->tx_lock);
+
+	dev_kfree_skb(skb);
+
+	/* Try to speculatively free used buffers */
+	if (flow_off)
+		cfv_release_used_buf(cfv->vq_tx);
+
+	return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops cfv_netdev_ops = {
+	.ndo_open = cfv_netdev_open,
+	.ndo_stop = cfv_netdev_close,
+	.ndo_start_xmit = cfv_netdev_tx,
+};
+
+static void cfv_netdev_setup(struct net_device *netdev)
+{
+	netdev->netdev_ops = &cfv_netdev_ops;
+	netdev->type = ARPHRD_CAIF;
+	netdev->tx_queue_len = 100;
+	netdev->flags = IFF_POINTOPOINT | IFF_NOARP;
+	netdev->mtu = CFV_DEF_MTU_SIZE;
+	netdev->destructor = free_netdev;
+}
+
+#define GET_VIRTIO_CONFIG_OPS(_v, _var, _f) \
+	((_v)->config->get(_v, offsetof(struct virtio_caif_transf_config, _f), \
+			   &_var, \
+			   FIELD_SIZEOF(struct virtio_caif_transf_config, _f)))
+
+static int __devinit cfv_probe(struct virtio_device *vdev)
+{
+	vq_callback_t *vq_cbs[] = { cfv_recv, cfv_release_used_buf };
+	const char *names[] = { "input", "output" };
+	const char *cfv_netdev_name = "cfvrt";
+	struct net_device *netdev;
+	struct virtqueue *vqs[2];
+	struct cfv_info *cfv;
+	int err = 0;
+
+	netdev = alloc_netdev(sizeof(struct cfv_info), cfv_netdev_name,
+			      cfv_netdev_setup);
+	if (!netdev)
+		return -ENOMEM;
+
+	cfv = netdev_priv(netdev);
+	cfv->vdev = vdev;
+	cfv->ndev = netdev;
+
+	spin_lock_init(&cfv->tx_lock);
+
+	/* Get two virtqueues, for tx/ul and rx/dl */
+	err = vdev->config->find_vqs(vdev, 2, vqs, vq_cbs, names);
+	if (err)
+		goto free_cfv;
+
+	cfv->vq_rx = vqs[0];
+	cfv->vq_tx = vqs[1];
+
+	if (vdev->config->get) {
+		GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_hr, headroom);
+		GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_hr, headroom);
+		GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_tr, tailroom);
+		GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_tr, tailroom);
+		GET_VIRTIO_CONFIG_OPS(vdev, cfv->mtu, mtu);
+		GET_VIRTIO_CONFIG_OPS(vdev, cfv->mru, mtu);
+	} else {
+		cfv->tx_hr = CFV_DEF_HEADROOM;
+		cfv->rx_hr = CFV_DEF_HEADROOM;
+		cfv->tx_tr = CFV_DEF_TAILROOM;
+		cfv->rx_tr = CFV_DEF_TAILROOM;
+		cfv->mtu = CFV_DEF_MTU_SIZE;
+		cfv->mru = CFV_DEF_MTU_SIZE;
+
+	}
+
+	vdev->priv = cfv;
+
+	netif_carrier_off(netdev);
+
+	/* register Netdev */
+	err = register_netdev(netdev);
+	if (err) {
+		dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err);
+		goto vqs_del;
+	}
+
+	/* tell the remote processor it can start sending messages */
+	virtqueue_kick(cfv->vq_rx);
+	return 0;
+
+vqs_del:
+	vdev->config->del_vqs(cfv->vdev);
+free_cfv:
+	free_netdev(netdev);
+	return err;
+}
+
+static void __devexit cfv_remove(struct virtio_device *vdev)
+{
+	struct cfv_info *cfv = vdev->priv;
+	vdev->config->reset(vdev);
+	vdev->config->del_vqs(cfv->vdev);
+	unregister_netdev(cfv->ndev);
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_CAIF, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static unsigned int features[] = {
+};
+
+static struct virtio_driver caif_virtio_driver = {
+	.feature_table		= features,
+	.feature_table_size	= ARRAY_SIZE(features),
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= cfv_probe,
+	.remove			= cfv_remove,
+};
+
+module_driver(caif_virtio_driver, register_virtio_driver,
+	      unregister_virtio_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 270fb22..8ddad5a 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -37,5 +37,6 @@
 #define VIRTIO_ID_RPMSG		7 /* virtio remote processor messaging */
 #define VIRTIO_ID_SCSI		8 /* virtio scsi */
 #define VIRTIO_ID_9P		9 /* 9p virtio console */
+#define VIRTIO_ID_CAIF		12 /* virtio caif */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
  2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
                   ` (3 preceding siblings ...)
  2012-10-31 22:46 ` [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio Sjur Brændeland
@ 2012-11-01  7:41 ` Rusty Russell
  2012-11-05 12:12   ` Sjur Brændeland
  4 siblings, 1 reply; 23+ messages in thread
From: Rusty Russell @ 2012-11-01  7:41 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, sjur, dmitry.tarnyagin

Sjur Brændeland <sjur@brendeland.net> writes:
> Zero-Copy data transport on the modem is primary goal for CAIF Virtio.
> In order to achieve Zero-Copy the direction of the Virtio rings are
> flipped in the RX direction. So we have implemented the Virtio
> access-function similar to what is found in vhost.c.

So, this adds another host-side virtqueue implementation.

Can we combine them together conveniently?  You pulled out more stuff
into vring.h which is a start, but it's a bit overloaded.

Perhaps we should separate the common fields into struct vring, and use
it to build:

        struct vring_guest {
                struct vring vr;
                u16 last_used_idx;
        };

        struct vring_host {
                struct vring vr;
                u16 last_avail_idx;
        };

I haven't looked closely at vhost to see what it wants, but I would
think we could share more code.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
  2012-11-01  7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
@ 2012-11-05 12:12   ` Sjur Brændeland
  2012-11-06  2:09     ` Rusty Russell
  0 siblings, 1 reply; 23+ messages in thread
From: Sjur Brændeland @ 2012-11-05 12:12 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, dmitry.tarnyagin

Hi Rusty,

> So, this adds another host-side virtqueue implementation.
>
> Can we combine them together conveniently?  You pulled out more stuff
> into vring.h which is a start, but it's a bit overloaded.
> Perhaps we should separate the common fields into struct vring, and use
> it to build:
>
>         struct vring_guest {
>                 struct vring vr;
>                 u16 last_used_idx;
>         };
>
>         struct vring_host {
>                 struct vring vr;
>                 u16 last_avail_idx;
>         };
> I haven't looked closely at vhost to see what it wants, but I would
> think we could share more code.

I have played around with the code in vhost.c to explore your idea.
The main issue I run into is that vhost.c is accessing user data while my new
code does not. So I end up with some quirky code testing if the ring lives in
user memory or not.  Another issue is sparse warnings when
accessing user memory.

With your suggested changes I end up sharing about 100 lines of code.
So in sum, I feel this add more complexity than what we gain by sharing.

Below is an initial draft of the re-usable code. I added "is_uaccess" to struct
virtio_ring in order to know if the ring lives in user memory.

Let me know what you think.

[snip]
int virtqueue_add_used(struct vring_host *vr, unsigned int head, int len,
		    struct vring_used_elem  **used)
{
	/* The virtqueue contains a ring of used buffers.  Get a pointer to the
	 * next entry in that used ring. */
	*used = &vr->vring.used->ring[vr->last_used_idx % vr->vring.num];
	if (vr->is_uaccess) {
		if(unlikely(__put_user(head, &(*used)->id))) {
			pr_debug("Failed to write used id");
			return -EFAULT;
		}
		if (unlikely(__put_user(len, &(*used)->len))) {
			pr_debug("Failed to write used len");
			return -EFAULT;
		}
		smp_wmb();
		if (__put_user(vr->last_used_idx + 1,
			       &vr->vring.used->idx)) {
			pr_debug("Failed to increment used idx");
			return -EFAULT;
		}
	} else {
		(*used)->id = head;
		(*used)->len = len;
		smp_wmb();
		vr->vring.used->idx = vr->last_used_idx + 1;
	}
	vr->last_used_idx++;
	return 0;
}

/* Each buffer in the virtqueues is actually a chain of descriptors.  This
 * function returns the next descriptor in the chain,
 * or -1U if we're at the end. */
unsigned virtqueue_next_desc(struct vring_desc *desc)
{
	unsigned int next;

	/* If this descriptor says it doesn't chain, we're done. */
	if (!(desc->flags & VRING_DESC_F_NEXT))
		return -1U;

	/* Check they're not leading us off end of descriptors. */
	next = desc->next;
	/* Make sure compiler knows to grab that: we don't want it changing! */
	/* We will use the result as an index in an array, so most
	 * architectures only need a compiler barrier here. */
	read_barrier_depends();

	return next;
}

static int virtqueue_next_avail_desc(struct vring_host *vr)
{
	int head;
	u16 last_avail_idx;

	/* Check it isn't doing very strange things with descriptor numbers. */
	last_avail_idx = vr->last_avail_idx;
	if (vr->is_uaccess) {
		if (__get_user(vr->avail_idx, &vr->vring.avail->idx)) {
			pr_debug("Failed to access avail idx at %p\n",
				 &vr->vring.avail->idx);
			return -EFAULT;
		}
	} else
		vr->avail_idx = vr->vring.avail->idx;

	if (unlikely((u16)(vr->avail_idx - last_avail_idx) > vr->vring.num)) {
		pr_debug("Guest moved used index from %u to %u",
		       last_avail_idx, vr->avail_idx);
		return -EFAULT;
	}

	/* If there's nothing new since last we looked, return invalid. */
	if (vr->avail_idx == last_avail_idx)
		return vr->vring.num;

	/* Only get avail ring entries after they have been exposed by guest. */
	smp_rmb();

	/* Grab the next descriptor number they're advertising, and increment
	 * the index we've seen. */
	if (vr->is_uaccess) {
		if (unlikely(__get_user(head,
				&vr->vring.avail->ring[last_avail_idx
						       % vr->vring.num]))) {
			pr_debug("Failed to read head: idx %d address %p\n",
				 last_avail_idx,
				 &vr->vring.avail->ring[last_avail_idx %
							vr->vring.num]);
			return -EFAULT;
		}
	} else
		head = vr->vring.avail->ring[last_avail_idx % vr->vring.num];

	/* If their number is silly, that's an error. */
	if (unlikely(head >= vr->vring.num)) {
		pr_debug("Guest says index %u > %u is available",
		       head, vr->vring.num);
		return -EINVAL;
	}

	return head;
}

Thanks,
Sjur

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
  2012-11-05 12:12   ` Sjur Brændeland
@ 2012-11-06  2:09     ` Rusty Russell
       [not found]       ` <1354718230-4486-1-git-send-email-sjur@brendeland.net>
  0 siblings, 1 reply; 23+ messages in thread
From: Rusty Russell @ 2012-11-06  2:09 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Michael S. Tsirkin, Linus Walleij, Ohad Ben-Cohen, linux-kernel,
	netdev, virtualization, dmitry.tarnyagin

Sjur Brændeland <sjurbr@gmail.com> writes:
> Hi Rusty,
>
>> So, this adds another host-side virtqueue implementation.
>>
>> Can we combine them together conveniently?  You pulled out more stuff
>> into vring.h which is a start, but it's a bit overloaded.
>> Perhaps we should separate the common fields into struct vring, and use
>> it to build:
>>
>>         struct vring_guest {
>>                 struct vring vr;
>>                 u16 last_used_idx;
>>         };
>>
>>         struct vring_host {
>>                 struct vring vr;
>>                 u16 last_avail_idx;
>>         };
>> I haven't looked closely at vhost to see what it wants, but I would
>> think we could share more code.
>
> I have played around with the code in vhost.c to explore your idea.
> The main issue I run into is that vhost.c is accessing user data while my new
> code does not. So I end up with some quirky code testing if the ring lives in
> user memory or not.  Another issue is sparse warnings when
> accessing user memory.

Sparse is a servant, not a master.  If that's the only thing stopping
us, we can ignore it (or hack around it).

> With your suggested changes I end up sharing about 100 lines of code.
> So in sum, I feel this add more complexity than what we gain by sharing.
>
> Below is an initial draft of the re-usable code. I added "is_uaccess" to struct
> virtio_ring in order to know if the ring lives in user memory.
>
> Let me know what you think.

Agreed, that's horrible...

Fortunately, recent GCCs will inline function pointers, so inlining this
and handing an accessor function gets optimized away.

I would really like this, because I'd love to have a config option to do
strict checking on the format of these things (similar to my recently
posted CONFIG_VIRTIO_DEVICE_TORTURE patch).

See below.

> int virtqueue_add_used(struct vring_host *vr, unsigned int head, int len,
> 		    struct vring_used_elem  **used)
> {
> 	/* The virtqueue contains a ring of used buffers.  Get a pointer to the
> 	 * next entry in that used ring. */
> 	*used = &vr->vring.used->ring[vr->last_used_idx % vr->vring.num];
> 	if (vr->is_uaccess) {
> 		if(unlikely(__put_user(head, &(*used)->id))) {
> 			pr_debug("Failed to write used id");
> 			return -EFAULT;
> 		}
> 		if (unlikely(__put_user(len, &(*used)->len))) {
> 			pr_debug("Failed to write used len");
> 			return -EFAULT;
> 		}
> 		smp_wmb();
> 		if (__put_user(vr->last_used_idx + 1,
> 			       &vr->vring.used->idx)) {
> 			pr_debug("Failed to increment used idx");
> 			return -EFAULT;
> 		}
> 	} else {
> 		(*used)->id = head;
> 		(*used)->len = len;
> 		smp_wmb();
> 		vr->vring.used->idx = vr->last_used_idx + 1;
> 	}
> 	vr->last_used_idx++;
> 	return 0;
> }

/* Untested! */
static inline bool in_kernel_put(u32 *dst, u32 v)
{
        *dst = v;
        return true;
}

static inline bool userspace_put(u32 *dst, u32 v)
{
	return __put_user(dst, v) == 0;
}

static inline struct vring_used_elem *vrh_add_used(struct vring_host *vr,
                                                   unsigned int head, u32 len,
                                                   bool (*put)(u32 *dst, u32 v))
{
        struct vring_used_elem *used;

	/* The virtqueue contains a ring of used buffers.  Get a pointer to the
	 * next entry in that used ring. */
	used = &vr->vring.used->ring[vr->last_used_idx % vr->vring.num];
        
	if (!put(&used->id, head) || !put(&used->len = len))
                return NULL;
	smp_wmb();
        if (!put(&vr->vring.used->idx, vr->last_used_idx + 1))
                return NULL;
	vr->last_used_idx++;
	return used;
}

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
       [not found]               ` <87pq1f2rj0.fsf@rustcorp.com.au>
@ 2013-01-10 10:30                 ` Rusty Russell
  2013-01-10 11:11                   ` Michael S. Tsirkin
  2013-01-10 18:39                   ` Sjur Brændeland
  0 siblings, 2 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-10 10:30 UTC (permalink / raw)
  To: Sjur Brændeland, Michael S. Tsirkin
  Cc: Linus Walleij, virtualization, LKML

Rusty Russell <rusty@rustcorp.com.au> writes:
> It basically involves moving much of vring.c into a virtio_host.c: the
> parts which actually touch the ring.  Then it provides accessors for
> vring.c to use which are __user-safe (all casts are inside
> virtio_host.c).
>
> I should have something to post by end of today, my time...

Well, that was optimistic.

I now have some lightly-tested code (via a userspace harness).  The
interface will probably change again as I try to adapt vhost to use it.

The emphasis is getting a sglist out of the vring as efficiently as
possible.  This involves some hacks: I'm still wondering if we should
move the address mapping logic into the virtio_host core, with a callout
if an address we want is outside a single range.

Not sure why vring/net doesn't built a packet and feed it in
netif_rx_ni().  This is what tun seems to do, and with this code it
should be fairly optimal.

Cheers,
Rusty.

virtio_host: host-side implementation of virtio rings.

Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).

This patch attempts to abstract the business of dealing with the
virtio ring layout from the access (userspace or direct); to do this,
we use function pointers, which gcc inlines correctly.

The new API should be more efficient than the existing vhost code,
too, since we convert directly to chained sg lists, which can be
modified in place to map the pages.

Disadvantages:
1) The spec allows chained indirect entries, we don't.  Noone does this,
   but it's not as crazy as it sounds, so perhaps we should support it.  If
   we did, we'd almost certainly invoke an function ptr call to check the
   validity of the indirect mem.

2) Getting an accessor is ugly; if it's indirect, the caller has to check
   that it's valid.  Efficient, but it's a horrible API.

No doubt this will change as I try to adapt existing vhost drivers.

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..38ec470 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
 config VHOST_NET
 	tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
 	depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
+	select VHOST
 	---help---
 	  This kernel module can be loaded in host kernel to accelerate
 	  guest networking with virtio_net. Not to be confused with virtio_net
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..f4c3704 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
 config TCM_VHOST
 	tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
 	depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+	select VHOST
 	default n
 	---help---
 	Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..fd95d3e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,12 @@ config VIRTIO
 	  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
 	  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VHOST
+	tristate
+	---help---
+	  This option is selected by any driver which needs to access
+	  the host side of a virtio ring.
+
 menu "Virtio drivers"
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..9833cd5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VHOST) += virtio_host.o
diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
new file mode 100644
index 0000000..169c8e2
--- /dev/null
+++ b/drivers/virtio/virtio_host.c
@@ -0,0 +1,668 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/virtio_host.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+
+/* An inline function, for easy cold marking. */
+static __cold bool __vringh_bad(void)
+{
+	static DEFINE_RATELIMIT_STATE(vringh_rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	return __ratelimit(&vringh_rs);
+}
+
+#define vringh_bad(fmt, ...)						\
+	do { if (__vringh_bad())					\
+			printk(KERN_NOTICE "vringh: " fmt "\n", __VA_ARGS__); \
+	} while(0)
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+				    int (*getu16)(u16 *val, const u16 *p),
+				    u16 *last_avail_idx)
+{
+	u16 avail_idx, i, head;
+	int err;
+
+	err = getu16(&avail_idx, &vrh->vring.avail->idx);
+	if (err) {
+		vringh_bad("Failed to access avail idx at %p",
+			   &vrh->vring.avail->idx);
+		return err;
+	}
+
+	err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to access last avail idx at %p",
+			   &vring_avail_event(&vrh->vring));
+		return err;
+	}
+
+	if (*last_avail_idx == avail_idx)
+		return vrh->vring.num;
+
+	/* Only get avail ring entries after they have been exposed by guest. */
+	smp_rmb();
+
+	i = *last_avail_idx & (vrh->vring.num - 1);
+
+	err = getu16(&head, &vrh->vring.avail->ring[i]);
+	if (err) {
+		vringh_bad("Failed to read head: idx %d address %p",
+			   *last_avail_idx, &vrh->vring.avail->ring[i]);
+		return err;
+	}
+
+	if (head >= vrh->vring.num) {
+		vringh_bad("Guest says index %u > %u is available",
+			   head, vrh->vring.num);
+		return -EINVAL;
+	}
+	return head;
+}
+
+/*
+ * Initialize the vringh_access structure for this head.
+ *
+ * For direct buffers, the range is simply the desc[] array in the vring.
+ *
+ * For indirect buffers, the range is the indirect entry; check() is called
+ * to validate this range.
+ *
+ * -error otherwise.
+ */
+static inline int __vringh_get_access(const struct vringh *vrh, u16 head,
+				      int (*getdesc)(struct vring_desc *dst,
+						     const struct vring_desc *),
+				      struct vringh_acc *acc)
+{
+	int err;
+
+	acc->head = head;
+
+	err = getdesc(&acc->desc, &vrh->vring.desc[head]);
+	if (unlikely(err))
+		return err;
+
+	if (acc->desc.flags & VRING_DESC_F_INDIRECT) {
+		/* We don't support chained indirects. */
+		if (acc->desc.flags & VRING_DESC_F_NEXT)
+			return -EINVAL;
+		if (unlikely(acc->desc.len % sizeof(acc->desc)))
+			return -EINVAL;
+
+		acc->start = (void *)(long)acc->desc.addr;
+		acc->max = acc->desc.len / sizeof(acc->desc);
+
+		if (acc->max > vrh->vring.num)
+			return -EINVAL;
+
+		/* Force us to read first desc next time. */
+		acc->desc.len = 0;
+		acc->desc.next = 0;
+		acc->desc.flags = VRING_DESC_F_NEXT;
+	} else {
+		acc->start = vrh->vring.desc;
+		acc->max = vrh->vring.num;
+		acc->idx = head;
+	}
+	return 0;
+}
+
+/* Copy some bytes to/from the vring descriptor.  Returns num copied. */
+static inline int vsg_xfer(struct vringh_sg **vsg,
+			   unsigned int *num,
+			   void *ptr, size_t len,
+			   int (*xfer)(void *sgaddr, void *ptr, size_t len))
+{
+	int err, done = 0;
+
+	while (len && *num) {
+		size_t partlen;
+		struct scatterlist *sg = &(*vsg)->sg;
+
+		partlen = min(sg->length, len);
+		err = xfer(vringh_sg_addr(*vsg), ptr, partlen);
+		if (err)
+			return err;
+		sg->offset += partlen;
+		sg->length -= partlen;
+		len -= partlen;
+		done += partlen;
+		ptr += partlen;
+
+		if (sg->length == 0) {
+			*vsg = (struct vringh_sg *)sg_next(sg);
+			(*num)--;
+		}
+	}
+	return done;
+}
+
+static unsigned int rest_of_page(void *data)
+{
+	return PAGE_SIZE - ((unsigned long)data % PAGE_SIZE);
+}
+
+static struct vringh_sg *add_sg_chain(struct vringh_sg *end, gfp_t gfp)
+{
+	struct vringh_sg *vsg = (void *)__get_free_page(gfp);
+
+	if (!vsg)
+		return NULL;
+
+	sg_init_table(&vsg->sg, PAGE_SIZE / sizeof(*vsg));
+	sg_chain(&end->sg, 1, &vsg->sg);
+	return vsg;
+}
+
+/* We add a chain to the sg if we hit end: we're putting addresses in sg_page,
+ * as caller needs to map them itself. */
+static inline int add_to_sg(struct vringh_sg **vsg,
+			    void *addr, u32 len, gfp_t gfp)
+{
+	int done = 0;
+
+	while (len) {
+		int partlen;
+		void *paddr;
+
+		paddr = (void *)((long)addr & PAGE_MASK);
+
+		if (unlikely(sg_is_last(&(*vsg)->sg))) {
+			*vsg = add_sg_chain(*vsg, gfp);
+			if (!*vsg)
+				return -ENOMEM;
+		}
+
+		partlen = rest_of_page(addr);
+		if (partlen > len)
+			partlen = len;
+		sg_set_page(&(*vsg)->sg, paddr, partlen, offset_in_page(addr));
+		(*vsg)++;
+		len -= partlen;
+		addr += partlen;
+		done++;
+	}
+	return done;
+}
+
+static inline int
+__vringh_sg(struct vringh_acc *acc,
+	    struct vringh_sg *vsg,
+	    unsigned max,
+	    u16 write_flag,
+	    gfp_t gfp,
+	    int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
+{
+	unsigned count = 0, num_descs = 0;
+	struct vringh_sg *orig_vsg = vsg;
+	int err;
+
+	/* This sends end marker on sg[max-1], so we know when to chain. */
+	if (max)
+		sg_init_table(&vsg->sg, max);
+
+	for (;;) {
+		/* Exhausted this descriptor?  Read next. */
+		if (acc->desc.len == 0) {
+			if (!(acc->desc.flags & VRING_DESC_F_NEXT))
+				break;
+
+			if (num_descs++ == acc->max) {
+				err = -ELOOP;
+				goto fail;
+			}
+
+			if (acc->desc.next >= acc->max) {
+				vringh_bad("Chained index %u > %u",
+					   acc->desc.next, acc->max);
+				err = -EINVAL;
+				goto fail;
+			}
+
+			acc->idx = acc->desc.next;
+			err = getdesc(&acc->desc, acc->start + acc->idx);
+			if (unlikely(err))
+				goto fail;
+		}
+
+		if (unlikely(!max)) {
+			vringh_bad("Unexpected %s descriptor",
+				   write_flag ? "writable" : "readable");
+			return -EINVAL;
+		}
+
+		/* No more readable/writable descriptors? */
+		if ((acc->desc.flags & VRING_DESC_F_WRITE) != write_flag) {
+			/* We should not have readable after writable */
+			if (write_flag) {
+				vringh_bad("Readable desc %p after writable",
+					   acc->start + acc->idx);
+				err = -EINVAL;
+				goto fail;
+			}
+			break;
+		}
+
+		/* Append the pages into the sg. */
+		err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
+				acc->desc.len, gfp);
+		if (err < 0)
+			goto fail;
+		count += err;
+		acc->desc.len = 0;
+	}
+	if (count)
+		sg_mark_end(&vsg->sg);
+	return count;
+
+fail:
+	vringh_sg_free(orig_vsg);
+	return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh, u16 idx, u16 len,
+				    int (*getu16)(u16 *val, const u16 *p),
+				    int (*putu16)(u16 *p, u16 val),
+				    int (*putused)(struct vring_used_elem *dst,
+						   const struct vring_used_elem
+						   *s),
+				    bool *notify)
+{
+	struct vring_used_elem used;
+	struct vring_used *used_ring;
+	int err;
+	u16 used_idx, old, used_event;
+
+	used.id = idx;
+	used.len = len;
+
+	err = getu16(&used_idx, &vring_used_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to access used event %p",
+			   &vring_used_event(&vrh->vring));
+		return err;
+	}
+
+	used_ring = vrh->vring.used;
+
+	err = putused(&used_ring->ring[used_idx % vrh->vring.num], &used);
+	if (err) {
+		vringh_bad("Failed to write used entry %u at %p",
+			   used_idx % vrh->vring.num,
+			   &used_ring->ring[used_idx % vrh->vring.num]);
+		return err;
+	}
+
+	/* Make sure buffer is written before we update index. */
+	smp_wmb();
+
+	old = vrh->last_used_idx;
+	vrh->last_used_idx++;
+
+	err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
+	if (err) {
+		vringh_bad("Failed to update used index at %p",
+			   &vrh->vring.used->idx);
+		return err;
+	}
+
+	/* If we already know we need to notify, skip re-checking */
+	if (*notify)
+		return 0;
+
+	/* Flush out used index update. This is paired with the
+	 * barrier that the Guest executes when enabling
+	 * interrupts. */
+	smp_mb();
+
+	/* Old-style, without event indices. */
+	if (!vrh->event_indices) {
+		u16 flags;
+		err = getu16(&flags, &vrh->vring.avail->flags);
+		if (err) {
+			vringh_bad("Failed to get flags at %p",
+				   &vrh->vring.avail->flags);
+			return err;
+		}
+		if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
+			*notify = true;
+		return 0;
+	}
+
+	/* Modern: we know where other side is up to. */
+	err = getu16(&used_event, &vring_used_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to get used event idx at %p",
+			   &vring_used_event(&vrh->vring));
+		return err;
+	}
+	if (vring_need_event(used_event, vrh->last_used_idx, old))
+		*notify = true;
+	return 0;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+					  int (*getu16)(u16 *val, const u16 *p),
+					  int (*putu16)(u16 *p, u16 val))
+{
+	u16 avail;
+
+	/* Already enabled? */
+	if (vrh->listening)
+		return false;
+
+	vrh->listening = true;
+
+	if (!vrh->event_indices) {
+		/* Old-school; update flags. */
+		if (putu16(&vrh->vring.used->flags, 0) != 0) {
+			vringh_bad("Clearing used flags %p",
+				   &vrh->vring.used->flags);
+			return false;
+		}
+	} else {
+		if (putu16(&vring_avail_event(&vrh->vring),
+			   vrh->last_avail_idx) != 0) {
+			vringh_bad("Updating avail event index %p",
+				   &vring_avail_event(&vrh->vring));
+			return false;
+		}
+	}
+
+	/* They could have slipped one in as we were doing that: make
+	 * sure it's written, then check again. */
+	smp_mb();
+
+	if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+		vringh_bad("Failed to check avail idx at %p",
+			   &vrh->vring.avail->idx);
+		return false;
+	}
+
+	/* This is so unlikely, we just leave notifications enabled. */
+	return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+					   int (*putu16)(u16 *p, u16 val))
+{
+	/* Already disabled? */
+	if (!vrh->listening)
+		return;
+
+	vrh->listening = false;
+	if (!vrh->event_indices) {
+		/* Old-school; update flags. */
+		if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+			vringh_bad("Setting used flags %p",
+				   &vrh->vring.used->flags);
+		}
+	}
+}
+
+/* Userspace access helpers. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+	return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+	return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int getdesc_user(struct vring_desc *dst,
+			       const struct vring_desc *src)
+{
+	return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
+		-EFAULT;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+			       const struct vring_used_elem *s)
+{
+	return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
+		? 0 : -EFAULT;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+	return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
+		-EFAULT;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+	return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
+		-EFAULT;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+		     unsigned int num,
+		     struct vring_desc __user *desc,
+		     struct vring_avail __user *avail,
+		     struct vring_used __user *used)
+{
+	/* Sane power of 2 please! */
+	if (!num || num > 0xffff || (num & (num - 1))) {
+		vringh_bad("Bad ring size %zu", num);
+		return -EINVAL;
+	}
+
+	vrh->event_indices = (features & VIRTIO_RING_F_EVENT_IDX);
+	vrh->listening = false;
+	vrh->last_avail_idx = 0;
+	vrh->last_used_idx = 0;
+	vrh->vring.num = num;
+	vrh->vring.desc = (__force struct vring_desc *)desc;
+	vrh->vring.avail = (__force struct vring_avail *)avail;
+	vrh->vring.used = (__force struct vring_used *)used;
+	return 0;
+}
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @acc: the accessor structure to fill in.
+ *
+ * Returns 0 if it filled in @acc, or -errno.  @acc->max is 0 if the ring is
+ * empty.
+ *
+ * Make sure you check that acc->start to acc->start + acc->max is
+ * valid memory!
+ */
+int vringh_getdesc_user(struct vringh *vrh, struct vringh_acc *acc)
+{
+	int err;
+
+	err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+	if (unlikely(err))
+		return err;
+
+	/* Empty... */
+	if (err == vrh->vring.num) {
+		acc->max = 0;
+		return 0;
+	}
+
+	return __vringh_get_access(vrh, err, getdesc_user, acc);
+}
+
+/**
+ * vringh_rsg_user - form an sg from the remaining readable bytes.
+ * @acc: the accessor from vringh_get_user.
+ * @sg: the scatterlist to populate
+ * @num: the number of elements in @sg
+ * @gfp: the allocation flags if we need to chain onto @sg.
+ *
+ * This puts the page addresses into @sg: not the struct pages!  You must
+ * map the pages.  It will allocate and chained sgs if required: in this
+ * case the return value will be >= num - 1, and vringh_sg_free()
+ * must be called to free the chained elements.
+ *
+ * You are expected to pull / rsg all readable bytes before accessing writable
+ * bytes.
+ *
+ * Returns -errno, or number of @sg elements created.
+ */
+int vringh_rsg_user(struct vringh_acc *acc,
+		    struct vringh_sg *vsg, unsigned num, gfp_t gfp)
+{
+	return __vringh_sg(acc, vsg, num, 0, gfp, getdesc_user);
+}
+
+/**
+ * vringh_rsg_pull_user - copy bytes from vsg.
+ * @vsg: the vsg from vringh_rsg_user() (updated as we consume)
+ * @num: the number of elements in @vsg (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_rsg_pull_user(struct vringh_sg **vsg, unsigned *num,
+			     void *dst, size_t len)
+{
+	return vsg_xfer(vsg, num, dst, len, xfer_from_user);
+}
+
+/**
+ * vringh_wsg_user - form an sg from the remaining writable bytes.
+ * @acc: the accessor from vringh_get_user.
+ * @sg: the scatterlist to populate
+ * @num: the number of elements in @sg
+ * @gfp: the allocation flags if we need to chain onto @sg.
+ *
+ * This puts the page addresses into @sg: not the struct pages!  You must
+ * map the pages.  It will allocate and chained sgs if required: in this
+ * case the return value will be >= num - 1, and vringh_sg_free()
+ * must be called to free the chained elements.
+ *
+ * You are expected to pull / rsg all readable bytes before calling this!
+ *
+ * Returns -errno, or number of @sg elements created.
+ */
+int vringh_wsg_user(struct vringh_acc *acc,
+		    struct vringh_sg *vsg, unsigned num, gfp_t gfp)
+{
+	return __vringh_sg(acc, vsg, num, VRING_DESC_F_WRITE, gfp,
+			   getdesc_user);
+}
+
+/**
+ * vringh_wsg_push_user - copy bytes to vsg.
+ * @vsg: the vsg from vringh_wsg_user() (updated as we consume)
+ * @num: the number of elements in @vsg (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_wsg_push_user(struct vringh_sg **vsg, unsigned *num,
+			     const void *src, size_t len)
+{
+	return vsg_xfer(vsg, num, (void *)src, len, xfer_to_user);
+}
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ *	 vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+	/* We only update vring_avail_event(vr) when we want to be notified,
+	 * so we haven't changed that yet. */
+	vrh->last_avail_idx -= num;
+}
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @acc: the accessor from vringh_get_user.
+ * @len: the length of data we have written.
+ * @notify: set if we should notify the other side, otherwise left alone.
+ */
+int vringh_complete_user(struct vringh *vrh,
+			 const struct vringh_acc *acc,
+			 u16 len,
+			 bool *notify)
+{
+	return __vringh_complete(vrh, acc->head, len,
+				 getu16_user, putu16_user, putused_user,
+				 notify);
+}
+
+/**
+ * vringh_sg_free - free a chained sg.
+ * @vsg: the vsg from vringh_wsg_user/vringh_rsg_user
+ *
+ * If vringh_wsg_user/vringh_rsg_user chains your sg, you should call
+ * this to free it.
+ */
+void __cold vringh_sg_free(struct vringh_sg *vsg)
+{
+	struct scatterlist *next, *curr_start, *orig, *sg;
+
+	sg = &vsg->sg;
+	curr_start = orig = sg;
+
+	while (sg) {
+		next = sg_next(sg);
+		if (sg_is_chain(sg+1)) {
+			if (curr_start != orig)
+				free_page((long)curr_start);
+			curr_start = next;
+		}
+		sg = next;
+	}
+	if (curr_start != orig)
+		free_page((long)curr_start);
+}
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+	return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+	__vringh_notify_disable(vrh, putu16_user);
+}
diff --git a/include/linux/virtio_host.h b/include/linux/virtio_host.h
new file mode 100644
index 0000000..cb4b693
--- /dev/null
+++ b/include/linux/virtio_host.h
@@ -0,0 +1,136 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VIRTIO_HOST_H
+#define _LINUX_VIRTIO_HOST_H
+#include <uapi/linux/virtio_ring.h>
+#include <linux/scatterlist.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+	/* Guest publishes used event idx (note: we always do). */
+	bool event_indices;
+
+	/* Have we told the other end we want to be notified? */
+	bool listening;
+
+	/* Last available index we saw (ie. where we're up to). */
+	u16 last_avail_idx;
+
+	/* Last index we used. */
+	u16 last_used_idx;
+
+	/* The vring (note: it may contain user pointers!) */
+	struct vring vring;
+};
+
+/**
+ * struct vringh_sg - a scatterlist containing addresses.
+ *
+ * This data structure is trivially mapped in-place to a real sg, but
+ * the method is best left to the users (they may have to map user
+ * pages and add offsets to addresses).
+ */
+struct vringh_sg {
+	struct scatterlist sg;
+} __packed;
+
+static inline void *vringh_sg_addr(const struct vringh_sg *vsg)
+{
+	return (void *)sg_page((struct scatterlist *)&vsg->sg) + vsg->sg.offset;
+}
+
+/* Accessor structure for a single descriptor. */
+struct vringh_acc {
+	/* Start address. */
+	struct vring_desc *start;
+
+	/* Maximum number of entries, <= ring size. */
+	u32 max;
+
+	/* Head index we got, for vringh_complete_user, and current index. */
+	u16 head, idx;
+
+	/* Cached descriptor. */
+	struct vring_desc desc;
+};
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+		     unsigned int num,
+		     struct vring_desc __user *desc,
+		     struct vring_avail __user *avail,
+		     struct vring_used __user *used);
+
+/* Get accessor to userspace vring: make sure start to start+max is valid! */
+int vringh_getdesc_user(struct vringh *vrh, struct vringh_acc *acc);
+
+/* Fetch readable descriptor in vsg (num == 0 gives error if any). */
+int vringh_rsg_user(struct vringh_acc *acc,
+		    struct vringh_sg *vsg, unsigned num, gfp_t gfp);
+
+/* Then fetch writable descriptor in sg (num == 0 gives error if any). */
+int vringh_wsg_user(struct vringh_acc *acc,
+		    struct vringh_sg *vsg, unsigned num, gfp_t gfp);
+
+/* Copy bytes from readable vsg, consuming it. */
+ssize_t vringh_rsg_pull_user(struct vringh_sg **vsg, unsigned *num,
+			     void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it. */
+ssize_t vringh_rsg_push_user(struct vringh_sg **vsg, unsigned *num,
+			     const void *src, size_t len);
+
+/* Unmap all the pages mapped in this sg. */
+void vringh_unmap_pages(struct scatterlist *sg, unsigned num);
+
+/* Map a vring_sg, turning it into a real sg. */
+static inline struct scatterlist *vringh_sg_map(struct vringh_sg *vsg,
+						unsigned num,
+						struct page *(*map)(void *addr))
+{
+	struct scatterlist *orig_sg = (struct scatterlist *)vsg, *sg;
+	int i;
+
+	for_each_sg(orig_sg, sg, num, i) {
+		struct page *p = map(sg_page(sg));
+		if (unlikely(IS_ERR(p))) {
+			vringh_unmap_pages(orig_sg, i);
+			return (struct scatterlist *)p;
+		}
+	}
+	return orig_sg;
+}
+
+/* If wsg or rsg returns > num - 1, call this to free sg chains. */
+void vringh_sg_free(struct vringh_sg *sg);
+
+/* Mark a descriptor as used.  Sets notify if you should fire eventfd. */
+int vringh_complete_user(struct vringh *vrh,
+			 const struct vringh_acc *acc,
+			 u16 len,
+			 bool *notify);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+#endif /* _LINUX_VIRTIO_HOST_H */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-10 10:30                 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Rusty Russell
@ 2013-01-10 11:11                   ` Michael S. Tsirkin
  2013-01-10 18:39                   ` Sjur Brændeland
  1 sibling, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-01-10 11:11 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML

On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
> Not sure why vring/net doesn't built a packet and feed it in
> netif_rx_ni().  This is what tun seems to do, and with this code it
> should be fairly optimal.

Because we want to use NAPI.

-- 
MST

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-10 10:30                 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Rusty Russell
  2013-01-10 11:11                   ` Michael S. Tsirkin
@ 2013-01-10 18:39                   ` Sjur Brændeland
  2013-01-10 23:35                     ` Rusty Russell
  1 sibling, 1 reply; 23+ messages in thread
From: Sjur Brændeland @ 2013-01-10 18:39 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

Hi Rusty,

On Thu, Jan 10, 2013 at 11:30 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
...
>I now have some lightly-tested code (via a userspace harness).

Great - thank you for looking into this. I will start integrating this
with my patches
when you send out a proper patch.

...
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..fd95d3e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,12 @@ config VIRTIO
>           bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>           CONFIG_RPMSG or CONFIG_S390_GUEST.
>
> +config VHOST
> +       tristate

Inclusion of drivers/virtio from drivers/Makefile depends on VIRTIO.
So I guess VHOST should select VIRTIO to ensure that
drivers/virtio/virtio_host.c
is part of the build.

> +       ---help---
> +         This option is selected by any driver which needs to access
> +         the host side of a virtio ring.
> +
...
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> +                                   int (*getu16)(u16 *val, const u16 *p),
> +                                   u16 *last_avail_idx)
> +{
> +       u16 avail_idx, i, head;
> +       int err;
> +
> +       err = getu16(&avail_idx, &vrh->vring.avail->idx);
> +       if (err) {
> +               vringh_bad("Failed to access avail idx at %p",
> +                          &vrh->vring.avail->idx);
> +               return err;
> +       }
> +
> +       err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
> +       if (err) {
> +               vringh_bad("Failed to access last avail idx at %p",
> +                          &vring_avail_event(&vrh->vring));
> +               return err;
> +       }
> +
> +       if (*last_avail_idx == avail_idx)
> +               return vrh->vring.num;
> +
> +       /* Only get avail ring entries after they have been exposed by guest. */
> +       smp_rmb();

We are accessing memory shared with a remote device (modem), so we probably
need mandatory barriers here, e.g. something like the virtio_rmb
defined in virtio_ring.c.

> +
> +       i = *last_avail_idx & (vrh->vring.num - 1);
> +
> +       err = getu16(&head, &vrh->vring.avail->ring[i]);
> +       if (err) {
> +               vringh_bad("Failed to read head: idx %d address %p",
> +                          *last_avail_idx, &vrh->vring.avail->ring[i]);
> +               return err;
> +       }
> +
> +       if (head >= vrh->vring.num) {
> +               vringh_bad("Guest says index %u > %u is available",
> +                          head, vrh->vring.num);
> +               return -EINVAL;
> +       }
> +       return head;
> +}
...
> +static inline int
> +__vringh_sg(struct vringh_acc *acc,
> +           struct vringh_sg *vsg,
> +           unsigned max,
> +           u16 write_flag,
> +           gfp_t gfp,
> +           int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> +{
> +       unsigned count = 0, num_descs = 0;
> +       struct vringh_sg *orig_vsg = vsg;
> +       int err;
> +
> +       /* This sends end marker on sg[max-1], so we know when to chain. */
> +       if (max)
> +               sg_init_table(&vsg->sg, max);
> +
> +       for (;;) {
> +               /* Exhausted this descriptor?  Read next. */
> +               if (acc->desc.len == 0) {
> +                       if (!(acc->desc.flags & VRING_DESC_F_NEXT))
> +                               break;
> +
> +                       if (num_descs++ == acc->max) {
> +                               err = -ELOOP;
> +                               goto fail;
> +                       }
> +
> +                       if (acc->desc.next >= acc->max) {
> +                               vringh_bad("Chained index %u > %u",
> +                                          acc->desc.next, acc->max);
> +                               err = -EINVAL;
> +                               goto fail;
> +                       }
> +
> +                       acc->idx = acc->desc.next;
> +                       err = getdesc(&acc->desc, acc->start + acc->idx);
> +                       if (unlikely(err))
> +                               goto fail;
> +               }
> +
> +               if (unlikely(!max)) {
> +                       vringh_bad("Unexpected %s descriptor",
> +                                  write_flag ? "writable" : "readable");
> +                       return -EINVAL;
> +               }
> +
> +               /* No more readable/writable descriptors? */
> +               if ((acc->desc.flags & VRING_DESC_F_WRITE) != write_flag) {
> +                       /* We should not have readable after writable */
> +                       if (write_flag) {
> +                               vringh_bad("Readable desc %p after writable",
> +                                          acc->start + acc->idx);
> +                               err = -EINVAL;
> +                               goto fail;
> +                       }
> +                       break;
> +               }
> +
> +               /* Append the pages into the sg. */
> +               err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
> +                               acc->desc.len, gfp);

I would prefer not to split into pages at this point, but rather provide an
iterator or the original list found in the descriptor to the client.

In our case we use virtio rings to talk to a LTE-modem over shared memory.
The IP traffic is received over the air, interleaved and arrives in
the virtio driver in
large bursts. So virtio driver on the modem receives multiple datagrams
held in large contiguous buffers. Our current approach is to handle each
buffer as a chained descriptor list, where each datagram is kept in
separate chained descriptors. When the buffers are consumed on the linux
host, the modem will read the chained descriptors from the used-ring and
free the entire contiguous buffer in one operation.

So I would prefer if we could avoid this approach of splitting buffers
received in
the ring into multiple sg-list entries as this would break the current
CAIF virtio
implementation.

Regards,
Sjur

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-10 18:39                   ` Sjur Brændeland
@ 2013-01-10 23:35                     ` Rusty Russell
  2013-01-11  6:37                       ` Rusty Russell
  2013-01-11 14:52                       ` Sjur Brændeland
  0 siblings, 2 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-10 23:35 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Michael S. Tsirkin, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

Sjur Brændeland <sjurbren@gmail.com> writes:
> Hi Rusty,
>
> On Thu, Jan 10, 2013 at 11:30 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> ...
>>I now have some lightly-tested code (via a userspace harness).
>
> Great - thank you for looking into this. I will start integrating this
> with my patches
> when you send out a proper patch.

Hi Sjur!

        OK, the Internet was no help here, how do you pronounce Sjur?
I'm guessing "shoor" rhyming with tour until I know better.

>> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
>> index 8d5bddb..fd95d3e 100644
>> --- a/drivers/virtio/Kconfig
>> +++ b/drivers/virtio/Kconfig
>> @@ -5,6 +5,12 @@ config VIRTIO
>>           bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>>           CONFIG_RPMSG or CONFIG_S390_GUEST.
>>
>> +config VHOST
>> +       tristate
>
> Inclusion of drivers/virtio from drivers/Makefile depends on VIRTIO.
> So I guess VHOST should select VIRTIO to ensure that
> drivers/virtio/virtio_host.c
> is part of the build.

Maybe I should move drivers/virtio/virtio_host.c to
drivers/vhost/vringh.c; I'll look at it.

It makes sense for vhost/ to contain the host-side stuff, since it
already exists.

>> +       if (*last_avail_idx == avail_idx)
>> +               return vrh->vring.num;
>> +
>> +       /* Only get avail ring entries after they have been exposed by guest. */
>> +       smp_rmb();
>
> We are accessing memory shared with a remote device (modem), so we probably
> need mandatory barriers here, e.g. something like the virtio_rmb
> defined in virtio_ring.c.

Fair enough, we can put those in a header.

>> +               /* Append the pages into the sg. */
>> +               err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
>> +                               acc->desc.len, gfp);
>
> I would prefer not to split into pages at this point, but rather provide an
> iterator or the original list found in the descriptor to the client.
>
> In our case we use virtio rings to talk to a LTE-modem over shared memory.
> The IP traffic is received over the air, interleaved and arrives in
> the virtio driver in
> large bursts. So virtio driver on the modem receives multiple datagrams
> held in large contiguous buffers. Our current approach is to handle each
> buffer as a chained descriptor list, where each datagram is kept in
> separate chained descriptors. When the buffers are consumed on the linux
> host, the modem will read the chained descriptors from the used-ring and
> free the entire contiguous buffer in one operation.

In other words, boundaries matter?

While the sg-in-place hack is close to optimal for TCM_VHOST, neither
net not you can use it directly.  I'll switch to an iovec (with a
similar use-caller-supplied-if-it-fits trick); they're smaller anyway.

More code coming...

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-10 23:35                     ` Rusty Russell
@ 2013-01-11  6:37                       ` Rusty Russell
  2013-01-11 15:02                         ` Sjur Brændeland
  2013-01-14 17:39                         ` Michael S. Tsirkin
  2013-01-11 14:52                       ` Sjur Brændeland
  1 sibling, 2 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-11  6:37 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Michael S. Tsirkin, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

Untested, but I wanted to post before the weekend.

I think the implementation is a bit nicer, and though we have a callback
to get the guest-to-userspace offset, it might be faster since I think
most cases will re-use the same mapping.

Feedback on API welcome!
Rusty.

virtio_host: host-side implementation of virtio rings (untested!)

Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).

This patch attempts to abstract the business of dealing with the
virtio ring layout from the access (userspace or direct); to do this,
we use function pointers, which gcc inlines correctly.

FIXME: strong barriers a-la virtio weak_barrier flag.
FIXME: separate notify call with flag if we wrapped.
FIXME: move to vhost/vringh.c.
FIXME: test :)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..38ec470 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
 config VHOST_NET
 	tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
 	depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
+	select VHOST
 	---help---
 	  This kernel module can be loaded in host kernel to accelerate
 	  guest networking with virtio_net. Not to be confused with virtio_net
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..f4c3704 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
 config TCM_VHOST
 	tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
 	depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+	select VHOST
 	default n
 	---help---
 	Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..fd95d3e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,12 @@ config VIRTIO
 	  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
 	  CONFIG_RPMSG or CONFIG_S390_GUEST.
 
+config VHOST
+	tristate
+	---help---
+	  This option is selected by any driver which needs to access
+	  the host side of a virtio ring.
+
 menu "Virtio drivers"
 
 config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..9833cd5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
 obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VHOST) += virtio_host.o
diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
new file mode 100644
index 0000000..7416741
--- /dev/null
+++ b/drivers/virtio/virtio_host.c
@@ -0,0 +1,618 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/virtio_host.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+
+static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
+{
+	static DEFINE_RATELIMIT_STATE(vringh_rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+	if (__ratelimit(&vringh_rs)) {
+		va_list ap;
+		va_start(ap, fmt);
+		printk(KERN_NOTICE "vringh:");
+		vprintk(fmt, ap);
+		va_end(ap);
+	}
+}
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+				    int (*getu16)(u16 *val, const u16 *p),
+				    u16 *last_avail_idx)
+{
+	u16 avail_idx, i, head;
+	int err;
+
+	err = getu16(&avail_idx, &vrh->vring.avail->idx);
+	if (err) {
+		vringh_bad("Failed to access avail idx at %p",
+			   &vrh->vring.avail->idx);
+		return err;
+	}
+
+	err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to access last avail idx at %p",
+			   &vring_avail_event(&vrh->vring));
+		return err;
+	}
+
+	if (*last_avail_idx == avail_idx)
+		return vrh->vring.num;
+
+	/* Only get avail ring entries after they have been exposed by guest. */
+	smp_rmb();
+
+	i = *last_avail_idx & (vrh->vring.num - 1);
+
+	err = getu16(&head, &vrh->vring.avail->ring[i]);
+	if (err) {
+		vringh_bad("Failed to read head: idx %d address %p",
+			   *last_avail_idx, &vrh->vring.avail->ring[i]);
+		return err;
+	}
+
+	if (head >= vrh->vring.num) {
+		vringh_bad("Guest says index %u > %u is available",
+			   head, vrh->vring.num);
+		return -EINVAL;
+	}
+	return head;
+}
+
+/* Copy some bytes to/from the iovec.  Returns num copied. */
+static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
+				      void *ptr, size_t len,
+				      int (*xfer)(void __user *addr, void *ptr,
+						  size_t len))
+{
+	int err, done = 0;
+
+	while (len && iov->i < iov->max) {
+		size_t partlen;
+
+		partlen = min(iov->iov[iov->i].iov_len, len);
+		err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
+		if (err)
+			return err;
+		done += partlen;
+		iov->iov[iov->i].iov_base += partlen;
+		iov->iov[iov->i].iov_len -= partlen;
+
+		if (iov->iov[iov->i].iov_len == 0)
+			iov->i++;
+	}
+	return done;
+}
+
+static inline bool check_range(u64 addr, u32 len,
+			       struct vringh_range *range,
+			       bool (*getrange)(u64, struct vringh_range *))
+{
+	if (addr < range->start || addr > range->end_incl) {
+		if (!getrange(addr, range))
+			goto bad;
+	}
+	BUG_ON(addr < range->start || addr > range->end_incl);
+
+	/* To end of memory? */
+	if (unlikely(addr + len == 0)) {
+		if (range->end_incl == -1ULL)
+			return true;
+		goto bad;
+	}
+
+	/* Otherwise, don't wrap. */
+	if (unlikely(addr + len < addr))
+		goto bad;
+	if (unlikely(addr + len > range->end_incl))
+		goto bad;
+	return true;
+
+bad:
+	vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
+	return false;
+}
+
+/* No reason for this code to be inline. */
+static int move_to_indirect(int *up_next, u16 *i, void *addr,
+			    const struct vring_desc *desc,
+			    struct vring_desc **descs, int *desc_max)
+{
+	/* Indirect tables can't have indirect. */
+	if (*up_next != -1) {
+		vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
+		return -EINVAL;
+	}
+
+	if (unlikely(desc->len % sizeof(struct vring_desc))) {
+		vringh_bad("Strange indirect len %u", desc->len);
+		return -EINVAL;
+	}
+
+	/* We will check this when we follow it! */
+	if (desc->flags & VRING_DESC_F_NEXT)
+		*up_next = desc->next;
+	else
+		*up_next = -2;
+	*descs = addr;
+	*desc_max = desc->len / sizeof(struct vring_desc);
+
+	/* Now, start at the first indirect. */
+	*i = 0;
+	return 0;
+}
+
+static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
+{
+	struct iovec *new;
+	unsigned int new_num = iov->max * 2;
+
+	if (new_num < 8)
+		new_num = 8;
+
+	if (iov->allocated)
+		new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
+	else {
+		new = kmalloc(new_num * sizeof(struct iovec), gfp);
+		if (new) {
+			memcpy(new, iov->iov, iov->i * sizeof(struct iovec));
+			iov->allocated = true;
+		}
+	}
+	if (!new)
+		return -ENOMEM;
+	iov->iov = new;
+	iov->max = new_num;
+	return 0;
+}
+
+static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
+				       struct vring_desc **descs, int *desc_max)
+{
+	u16 i = *up_next;
+
+	*up_next = -1;
+	*descs = vrh->vring.desc;
+	*desc_max = vrh->vring.num;
+	return i;
+}
+
+static inline int
+__vringh_iov(struct vringh *vrh, u16 i,
+	     struct vringh_iov *riov,
+	     struct vringh_iov *wiov,
+	     bool (*getrange)(u64 addr, struct vringh_range *r),
+	     gfp_t gfp,
+	     int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
+{
+	int err, count = 0, up_next, desc_max;
+	struct vring_desc desc, *descs;
+	struct vringh_range range = { -1ULL, 0 };
+
+	/* We start traversing vring's descriptor table. */
+	descs = vrh->vring.desc;
+	desc_max = vrh->vring.num;
+	up_next = -1;
+
+	riov->i = wiov->i = 0;
+	for (;;) {
+		void *addr;
+		struct vringh_iov *iov;
+
+		err = getdesc(&desc, &descs[i]);
+		if (unlikely(err))
+			goto fail;
+
+		/* Make sure it's OK, and get offset. */
+		if (!check_range(desc.addr, desc.len, &range, getrange)) {
+			err = -EINVAL;
+			goto fail;
+		}
+		addr = (void *)(long)desc.addr + range.offset;
+
+		if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+			err = move_to_indirect(&up_next, &i, addr, &desc,
+					       &descs, &desc_max);
+			if (err)
+				goto fail;
+			continue;
+		}
+
+		if (desc.flags & VRING_DESC_F_WRITE)
+			iov = wiov;
+		else {
+			iov = riov;
+			if (unlikely(wiov->i)) {
+				vringh_bad("Readable desc %p after writable",
+					   &descs[i]);
+				err = -EINVAL;
+				goto fail;
+			}
+		}
+
+		if (unlikely(iov->i == iov->max)) {
+			err = resize_iovec(iov, gfp);
+			if (err)
+				goto fail;
+		}
+
+		iov->iov[iov->i].iov_base = (__force __user void *)addr;
+		iov->iov[iov->i].iov_len = desc.len;
+		iov->i++;
+
+		if (++count == vrh->vring.num) {
+			vringh_bad("Descriptor loop in %p", descs);
+			err = -ELOOP;
+			goto fail;
+		}
+
+		if (desc.flags & VRING_DESC_F_NEXT) {
+			i = desc.next;
+		} else {
+			/* Just in case we need to finish traversing above. */
+			if (unlikely(up_next > 0))
+				i = return_from_indirect(vrh, &up_next,
+							 &descs, &desc_max);
+			else
+				break;
+		}
+
+		if (i >= desc_max) {
+			vringh_bad("Chained index %u > %u", i, desc_max);
+			err = -EINVAL;
+			goto fail;
+		}
+	}
+
+	/* Reset for fresh iteration. */
+	riov->i = wiov->i = 0;
+	return 0;
+
+fail:
+	if (riov->allocated)
+		kfree(riov->iov);
+	if (wiov->allocated)
+		kfree(wiov->iov);
+	return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
+				    int (*getu16)(u16 *val, const u16 *p),
+				    int (*putu16)(u16 *p, u16 val),
+				    int (*putused)(struct vring_used_elem *dst,
+						   const struct vring_used_elem
+						   *s),
+				    bool *notify)
+{
+	struct vring_used_elem used;
+	struct vring_used *used_ring;
+	int err;
+	u16 used_idx, old, used_event;
+
+	used.id = idx;
+	used.len = len;
+
+	err = getu16(&used_idx, &vring_used_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to access used event %p",
+			   &vring_used_event(&vrh->vring));
+		return err;
+	}
+
+	used_ring = vrh->vring.used;
+
+	err = putused(&used_ring->ring[used_idx % vrh->vring.num], &used);
+	if (err) {
+		vringh_bad("Failed to write used entry %u at %p",
+			   used_idx % vrh->vring.num,
+			   &used_ring->ring[used_idx % vrh->vring.num]);
+		return err;
+	}
+
+	/* Make sure buffer is written before we update index. */
+	smp_wmb();
+
+	old = vrh->last_used_idx;
+	vrh->last_used_idx++;
+
+	err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
+	if (err) {
+		vringh_bad("Failed to update used index at %p",
+			   &vrh->vring.used->idx);
+		return err;
+	}
+
+	/* If we already know we need to notify, skip re-checking */
+	if (*notify)
+		return 0;
+
+	/* Flush out used index update. This is paired with the
+	 * barrier that the Guest executes when enabling
+	 * interrupts. */
+	smp_mb();
+
+	/* Old-style, without event indices. */
+	if (!vrh->event_indices) {
+		u16 flags;
+		err = getu16(&flags, &vrh->vring.avail->flags);
+		if (err) {
+			vringh_bad("Failed to get flags at %p",
+				   &vrh->vring.avail->flags);
+			return err;
+		}
+		if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
+			*notify = true;
+		return 0;
+	}
+
+	/* Modern: we know where other side is up to. */
+	err = getu16(&used_event, &vring_used_event(&vrh->vring));
+	if (err) {
+		vringh_bad("Failed to get used event idx at %p",
+			   &vring_used_event(&vrh->vring));
+		return err;
+	}
+	if (vring_need_event(used_event, vrh->last_used_idx, old))
+		*notify = true;
+	return 0;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+					  int (*getu16)(u16 *val, const u16 *p),
+					  int (*putu16)(u16 *p, u16 val))
+{
+	u16 avail;
+
+	/* Already enabled? */
+	if (vrh->listening)
+		return false;
+
+	vrh->listening = true;
+
+	if (!vrh->event_indices) {
+		/* Old-school; update flags. */
+		if (putu16(&vrh->vring.used->flags, 0) != 0) {
+			vringh_bad("Clearing used flags %p",
+				   &vrh->vring.used->flags);
+			return false;
+		}
+	} else {
+		if (putu16(&vring_avail_event(&vrh->vring),
+			   vrh->last_avail_idx) != 0) {
+			vringh_bad("Updating avail event index %p",
+				   &vring_avail_event(&vrh->vring));
+			return false;
+		}
+	}
+
+	/* They could have slipped one in as we were doing that: make
+	 * sure it's written, then check again. */
+	smp_mb();
+
+	if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+		vringh_bad("Failed to check avail idx at %p",
+			   &vrh->vring.avail->idx);
+		return false;
+	}
+
+	/* This is so unlikely, we just leave notifications enabled. */
+	return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+					   int (*putu16)(u16 *p, u16 val))
+{
+	/* Already disabled? */
+	if (!vrh->listening)
+		return;
+
+	vrh->listening = false;
+	if (!vrh->event_indices) {
+		/* Old-school; update flags. */
+		if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+			vringh_bad("Setting used flags %p",
+				   &vrh->vring.used->flags);
+		}
+	}
+}
+
+/* Userspace access helpers. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+	return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+	return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int getdesc_user(struct vring_desc *dst,
+			       const struct vring_desc *src)
+{
+	return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
+		-EFAULT;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+			       const struct vring_used_elem *s)
+{
+	return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
+		? 0 : -EFAULT;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+	return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
+		-EFAULT;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+	return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
+		-EFAULT;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+		     unsigned int num,
+		     struct vring_desc __user *desc,
+		     struct vring_avail __user *avail,
+		     struct vring_used __user *used)
+{
+	/* Sane power of 2 please! */
+	if (!num || num > 0xffff || (num & (num - 1))) {
+		vringh_bad("Bad ring size %zu", num);
+		return -EINVAL;
+	}
+
+	vrh->event_indices = (features & VIRTIO_RING_F_EVENT_IDX);
+	vrh->listening = false;
+	vrh->last_avail_idx = 0;
+	vrh->last_used_idx = 0;
+	vrh->vring.num = num;
+	vrh->vring.desc = (__force struct vring_desc *)desc;
+	vrh->vring.avail = (__force struct vring_avail *)avail;
+	vrh->vring.used = (__force struct vring_used *)used;
+	return 0;
+}
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @riov: where to put the readable descriptors.
+ * @wiov: where to put the writable descriptors.
+ * @getrange: function to call to check ranges.
+ * @head: head index we received, for passing to vringh_complete_user().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * If it returns 1, riov->allocated and wiov->allocated indicate if you
+ * have to kfree riov->iov and wiov->iov respectively.
+ */
+int vringh_getdesc_user(struct vringh *vrh,
+			struct vringh_iov *riov,
+			struct vringh_iov *wiov,
+			bool (*getrange)(u64 addr, struct vringh_range *r),
+			u16 *head,
+			gfp_t gfp)
+{
+	int err;
+
+	err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+	if (err < 0)
+		return err;
+
+	/* Empty... */
+	if (err == vrh->vring.num)
+		return 0;
+
+	*head = err;
+	err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
+	if (err)
+		return err;
+
+	return 1;
+}
+
+/**
+ * vringh_iov_pull_user - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
+{
+	return vringh_iov_xfer(riov, dst, len, xfer_from_user);
+}
+
+/**
+ * vringh_iov_push_user - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+			     const void *src, size_t len)
+{
+	return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
+}
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ *	 vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+	/* We only update vring_avail_event(vr) when we want to be notified,
+	 * so we haven't changed that yet. */
+	vrh->last_avail_idx -= num;
+}
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_user.
+ * @len: the length of data we have written.
+ * @notify: set if we should notify the other side, otherwise left alone.
+ */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
+			 bool *notify)
+{
+	return __vringh_complete(vrh, head, len,
+				 getu16_user, putu16_user, putused_user,
+				 notify);
+}
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+	return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+	__vringh_notify_disable(vrh, putu16_user);
+}
diff --git a/include/linux/virtio_host.h b/include/linux/virtio_host.h
new file mode 100644
index 0000000..07bb4f6
--- /dev/null
+++ b/include/linux/virtio_host.h
@@ -0,0 +1,88 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VIRTIO_HOST_H
+#define _LINUX_VIRTIO_HOST_H
+#include <uapi/linux/virtio_ring.h>
+#include <uapi/linux/uio.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+	/* Guest publishes used event idx (note: we always do). */
+	bool event_indices;
+
+	/* Have we told the other end we want to be notified? */
+	bool listening;
+
+	/* Last available index we saw (ie. where we're up to). */
+	u16 last_avail_idx;
+
+	/* Last index we used. */
+	u16 last_used_idx;
+
+	/* The vring (note: it may contain user pointers!) */
+	struct vring vring;
+};
+
+/* The memory the vring can access, and what offset to apply. */
+struct vringh_range {
+	u64 start, end_incl;
+	u64 offset;
+};
+
+/* All the information about an iovec. */
+struct vringh_iov {
+	struct iovec *iov;
+	unsigned i, max;
+	bool allocated;
+};
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+		     unsigned int num,
+		     struct vring_desc __user *desc,
+		     struct vring_avail __user *avail,
+		     struct vring_used __user *used);
+
+/* Convert a descriptor into iovecs. */
+int vringh_getdesc_user(struct vringh *vrh,
+			struct vringh_iov *riov,
+			struct vringh_iov *wiov,
+			bool (*getrange)(u64 addr, struct vringh_range *r),
+			u16 *head,
+			gfp_t gfp);
+
+/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+			     const void *src, size_t len);
+
+/* Mark a descriptor as used.  Sets notify if you should fire eventfd. */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
+			 bool *notify);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+#endif /* _LINUX_VIRTIO_HOST_H */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-10 23:35                     ` Rusty Russell
  2013-01-11  6:37                       ` Rusty Russell
@ 2013-01-11 14:52                       ` Sjur Brændeland
  1 sibling, 0 replies; 23+ messages in thread
From: Sjur Brændeland @ 2013-01-11 14:52 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, virtualization, LKML,
	Ohad Ben-Cohen

On Fri, Jan 11, 2013 at 12:35 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Hi Sjur!
>
>         OK, the Internet was no help here, how do you pronounce Sjur?
> I'm guessing "shoor" rhyming with tour until I know better.

Thank you for asking! This is pretty close yes.
I usually tell people to pronounce it like "sure" for simplicity.
But Google translate has a perfect Norwegian voice pronouncing my name:
http://translate.google.com/#auto/no/sjur (Press the speaker button)

While at the subject: Norwegian names can be a laugh: I have people
named "Odd" and "Even" in my family, and I once had a friend named
"Aashold".

...

>> I would prefer not to split into pages at this point, but rather provide an
>> iterator or the original list found in the descriptor to the client.
...
> In other words, boundaries matter?
>
> While the sg-in-place hack is close to optimal for TCM_VHOST, neither
> net not you can use it directly.  I'll switch to an iovec (with a
> similar use-caller-supplied-if-it-fits trick); they're smaller anyway.

Great, I think iovec will be a good fit for my CAIF driver.

Thanks,
Sjur

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-11  6:37                       ` Rusty Russell
@ 2013-01-11 15:02                         ` Sjur Brændeland
  2013-01-12  0:26                           ` Rusty Russell
  2013-01-14 17:39                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 23+ messages in thread
From: Sjur Brændeland @ 2013-01-11 15:02 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michael S. Tsirkin, Linus Walleij, virtualization, LKML,
	Ohad Ben-Cohen

On Fri, Jan 11, 2013 at 7:37 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>virtio_host: host-side implementation of virtio rings (untested!)
>
>Getting use of virtio rings correct is tricky, and a recent patch saw
>an implementation of in-kernel rings (as separate from userspace).

How do you see the in-kernel API for this? I would like to see
something similar to my previous patches, where we extend
the virtqueue API. E.g. something like this:

struct virtqueue *vring_new_virtqueueh(unsigned int index,
				       unsigned int num,
				       unsigned int vring_align,
				       struct virtio_device *vdev,
				       bool weak_barriers,
				       void *pages,
				       void (*notify)(struct virtqueue *),
				       void (*callback)(struct virtqueue *),
				       const char *name);

int virtqueueh_get_iov(struct virtqueue *vqh,
		       struct vringh_iov *riov,
		       struct vringh_iov *wiov,
		       gfp_t gfp);

int virtqueueh_add_iov(struct virtqueue *vqh,
		       struct vringh_iov *riov,
		       struct vringh_iov *wiov);

I guess implementation of the host-virtqueue should stay in drivers/virtio?

Regards,
Sjur

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-11 15:02                         ` Sjur Brændeland
@ 2013-01-12  0:26                           ` Rusty Russell
  0 siblings, 0 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-12  0:26 UTC (permalink / raw)
  To: Sjur Brændeland
  Cc: Michael S. Tsirkin, Linus Walleij, virtualization, LKML,
	Ohad Ben-Cohen

Sjur Brændeland <sjurbren@gmail.com> writes:
> How do you see the in-kernel API for this? I would like to see
> something similar to my previous patches, where we extend
> the virtqueue API. E.g. something like this:

> struct virtqueue *vring_new_virtqueueh(unsigned int index,
> 				       unsigned int num,
> 				       unsigned int vring_align,
> 				       struct virtio_device *vdev,
> 				       bool weak_barriers,
> 				       void *pages,
> 				       void (*notify)(struct virtqueue *),
> 				       void (*callback)(struct virtqueue *),
> 				       const char *name);

I was just going to create _kernel variants of all the _user helpers,
and let you drive it directly like that.

If we get a second in-kernel user, we create wrappers (I'd prefer not to
overload struct virtqueue though).

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-11  6:37                       ` Rusty Russell
  2013-01-11 15:02                         ` Sjur Brændeland
@ 2013-01-14 17:39                         ` Michael S. Tsirkin
  2013-01-16  3:13                           ` Rusty Russell
  1 sibling, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-01-14 17:39 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

On Fri, Jan 11, 2013 at 05:07:44PM +1030, Rusty Russell wrote:
> Untested, but I wanted to post before the weekend.
> 
> I think the implementation is a bit nicer, and though we have a callback
> to get the guest-to-userspace offset, it might be faster since I think
> most cases will re-use the same mapping.
> 
> Feedback on API welcome!
> Rusty.
> 
> virtio_host: host-side implementation of virtio rings (untested!)
> 
> Getting use of virtio rings correct is tricky, and a recent patch saw
> an implementation of in-kernel rings (as separate from userspace).
> 
> This patch attempts to abstract the business of dealing with the
> virtio ring layout from the access (userspace or direct); to do this,
> we use function pointers, which gcc inlines correctly.
> 
> FIXME: strong barriers a-la virtio weak_barrier flag.
> FIXME: separate notify call with flag if we wrapped.
> FIXME: move to vhost/vringh.c.
> FIXME: test :)
> 
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index 202bba6..38ec470 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -1,6 +1,7 @@
>  config VHOST_NET
>  	tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
>  	depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
> +	select VHOST
>  	---help---
>  	  This kernel module can be loaded in host kernel to accelerate
>  	  guest networking with virtio_net. Not to be confused with virtio_net
> diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
> index a9c6f76..f4c3704 100644
> --- a/drivers/vhost/Kconfig.tcm
> +++ b/drivers/vhost/Kconfig.tcm
> @@ -1,6 +1,7 @@
>  config TCM_VHOST
>  	tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
>  	depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
> +	select VHOST
>  	default n
>  	---help---
>  	Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..fd95d3e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,12 @@ config VIRTIO
>  	  bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>  	  CONFIG_RPMSG or CONFIG_S390_GUEST.
>  
> +config VHOST
> +	tristate
> +	---help---
> +	  This option is selected by any driver which needs to access
> +	  the host side of a virtio ring.
> +
>  menu "Virtio drivers"
>  
>  config VIRTIO_PCI
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 9076635..9833cd5 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
>  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VHOST) += virtio_host.o
> diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
> new file mode 100644
> index 0000000..7416741
> --- /dev/null
> +++ b/drivers/virtio/virtio_host.c
> @@ -0,0 +1,618 @@
> +/*
> + * Helpers for the host side of a virtio ring.
> + *
> + * Since these may be in userspace, we use (inline) accessors.
> + */
> +#include <linux/virtio_host.h>
> +#include <linux/kernel.h>
> +#include <linux/ratelimit.h>
> +#include <linux/uaccess.h>
> +#include <linux/slab.h>
> +
> +static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
> +{
> +	static DEFINE_RATELIMIT_STATE(vringh_rs,
> +				      DEFAULT_RATELIMIT_INTERVAL,
> +				      DEFAULT_RATELIMIT_BURST);
> +	if (__ratelimit(&vringh_rs)) {
> +		va_list ap;
> +		va_start(ap, fmt);
> +		printk(KERN_NOTICE "vringh:");
> +		vprintk(fmt, ap);
> +		va_end(ap);
> +	}
> +}
> +
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> +				    int (*getu16)(u16 *val, const u16 *p),
> +				    u16 *last_avail_idx)
> +{
> +	u16 avail_idx, i, head;
> +	int err;
> +
> +	err = getu16(&avail_idx, &vrh->vring.avail->idx);
> +	if (err) {
> +		vringh_bad("Failed to access avail idx at %p",
> +			   &vrh->vring.avail->idx);
> +		return err;
> +	}
> +
> +	err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
> +	if (err) {
> +		vringh_bad("Failed to access last avail idx at %p",
> +			   &vring_avail_event(&vrh->vring));
> +		return err;
> +	}
> +
> +	if (*last_avail_idx == avail_idx)
> +		return vrh->vring.num;
> +
> +	/* Only get avail ring entries after they have been exposed by guest. */
> +	smp_rmb();
> +
> +	i = *last_avail_idx & (vrh->vring.num - 1);
> +
> +	err = getu16(&head, &vrh->vring.avail->ring[i]);
> +	if (err) {
> +		vringh_bad("Failed to read head: idx %d address %p",
> +			   *last_avail_idx, &vrh->vring.avail->ring[i]);
> +		return err;
> +	}
> +
> +	if (head >= vrh->vring.num) {
> +		vringh_bad("Guest says index %u > %u is available",
> +			   head, vrh->vring.num);
> +		return -EINVAL;
> +	}
> +	return head;
> +}
> +
> +/* Copy some bytes to/from the iovec.  Returns num copied. */
> +static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
> +				      void *ptr, size_t len,
> +				      int (*xfer)(void __user *addr, void *ptr,
> +						  size_t len))
> +{
> +	int err, done = 0;
> +
> +	while (len && iov->i < iov->max) {
> +		size_t partlen;
> +
> +		partlen = min(iov->iov[iov->i].iov_len, len);
> +		err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
> +		if (err)
> +			return err;
> +		done += partlen;
> +		iov->iov[iov->i].iov_base += partlen;
> +		iov->iov[iov->i].iov_len -= partlen;
> +
> +		if (iov->iov[iov->i].iov_len == 0)
> +			iov->i++;
> +	}
> +	return done;
> +}
> +
> +static inline bool check_range(u64 addr, u32 len,
> +			       struct vringh_range *range,
> +			       bool (*getrange)(u64, struct vringh_range *))
> +{
> +	if (addr < range->start || addr > range->end_incl) {
> +		if (!getrange(addr, range))
> +			goto bad;
> +	}
> +	BUG_ON(addr < range->start || addr > range->end_incl);
> +
> +	/* To end of memory? */
> +	if (unlikely(addr + len == 0)) {
> +		if (range->end_incl == -1ULL)
> +			return true;
> +		goto bad;
> +	}
> +
> +	/* Otherwise, don't wrap. */
> +	if (unlikely(addr + len < addr))
> +		goto bad;
> +	if (unlikely(addr + len > range->end_incl))
> +		goto bad;
> +	return true;
> +
> +bad:
> +	vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
> +	return false;
> +}
> +
> +/* No reason for this code to be inline. */
> +static int move_to_indirect(int *up_next, u16 *i, void *addr,
> +			    const struct vring_desc *desc,
> +			    struct vring_desc **descs, int *desc_max)
> +{
> +	/* Indirect tables can't have indirect. */
> +	if (*up_next != -1) {
> +		vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
> +		return -EINVAL;
> +	}
> +
> +	if (unlikely(desc->len % sizeof(struct vring_desc))) {
> +		vringh_bad("Strange indirect len %u", desc->len);
> +		return -EINVAL;
> +	}
> +
> +	/* We will check this when we follow it! */
> +	if (desc->flags & VRING_DESC_F_NEXT)
> +		*up_next = desc->next;
> +	else
> +		*up_next = -2;
> +	*descs = addr;
> +	*desc_max = desc->len / sizeof(struct vring_desc);
> +
> +	/* Now, start at the first indirect. */
> +	*i = 0;
> +	return 0;
> +}
> +
> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> +{
> +	struct iovec *new;
> +	unsigned int new_num = iov->max * 2;

We must limit this I think, this is coming
from userspace. How about UIO_MAXIOV?

> +
> +	if (new_num < 8)
> +		new_num = 8;
> +
> +	if (iov->allocated)
> +		new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
> +	else {
> +		new = kmalloc(new_num * sizeof(struct iovec), gfp);
> +		if (new) {
> +			memcpy(new, iov->iov, iov->i * sizeof(struct iovec));
> +			iov->allocated = true;
> +		}
> +	}
> +	if (!new)
> +		return -ENOMEM;
> +	iov->iov = new;
> +	iov->max = new_num;
> +	return 0;
> +}
> +
> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
> +				       struct vring_desc **descs, int *desc_max)

Not sure it should be cold like that - virtio net uses indirect on data
path.

> +{
> +	u16 i = *up_next;
> +
> +	*up_next = -1;
> +	*descs = vrh->vring.desc;
> +	*desc_max = vrh->vring.num;
> +	return i;
> +}
> +
> +static inline int
> +__vringh_iov(struct vringh *vrh, u16 i,
> +	     struct vringh_iov *riov,
> +	     struct vringh_iov *wiov,
> +	     bool (*getrange)(u64 addr, struct vringh_range *r),
> +	     gfp_t gfp,
> +	     int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> +{
> +	int err, count = 0, up_next, desc_max;
> +	struct vring_desc desc, *descs;
> +	struct vringh_range range = { -1ULL, 0 };
> +
> +	/* We start traversing vring's descriptor table. */
> +	descs = vrh->vring.desc;
> +	desc_max = vrh->vring.num;
> +	up_next = -1;
> +
> +	riov->i = wiov->i = 0;
> +	for (;;) {
> +		void *addr;
> +		struct vringh_iov *iov;
> +
> +		err = getdesc(&desc, &descs[i]);
> +		if (unlikely(err))
> +			goto fail;
> +
> +		/* Make sure it's OK, and get offset. */
> +		if (!check_range(desc.addr, desc.len, &range, getrange)) {
> +			err = -EINVAL;
> +			goto fail;
> +		}

Hmm this looks like it will translate and
validate immediate descriptors same way as indirect ones.
vhost-net has different translation for regular descriptors
and indirect ones, both for speed and to allow ring aliasing,
so it has to know which is which.



> +		addr = (void *)(long)desc.addr + range.offset;

I really dislike raw pointers that we must never dereference.
Since we are forcing everything to __user anyway, why don't we
tag all addresses as __user? The kernel users of this API
can cast that away, this will keep the casts to minimum.

Failing that, we can add our own class
# define __virtio         __attribute__((noderef, address_space(2)))


> +
> +		if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
> +			err = move_to_indirect(&up_next, &i, addr, &desc,
> +					       &descs, &desc_max);
> +			if (err)
> +				goto fail;
> +			continue;
> +		}
> +
> +		if (desc.flags & VRING_DESC_F_WRITE)
> +			iov = wiov;
> +		else {
> +			iov = riov;
> +			if (unlikely(wiov->i)) {
> +				vringh_bad("Readable desc %p after writable",
> +					   &descs[i]);
> +				err = -EINVAL;
> +				goto fail;
> +			}
> +		}
> +
> +		if (unlikely(iov->i == iov->max)) {
> +			err = resize_iovec(iov, gfp);
> +			if (err)
> +				goto fail;
> +		}
> +
> +		iov->iov[iov->i].iov_base = (__force __user void *)addr;
> +		iov->iov[iov->i].iov_len = desc.len;
> +		iov->i++;


This looks like it won't do the right thing if desc.len spans multiple
ranges. I don't know if this happens in practice but this is something
vhost supports ATM.

> +
> +		if (++count == vrh->vring.num) {
> +			vringh_bad("Descriptor loop in %p", descs);
> +			err = -ELOOP;
> +			goto fail;
> +		}
> +
> +		if (desc.flags & VRING_DESC_F_NEXT) {
> +			i = desc.next;
> +		} else {
> +			/* Just in case we need to finish traversing above. */
> +			if (unlikely(up_next > 0))
> +				i = return_from_indirect(vrh, &up_next,
> +							 &descs, &desc_max);
> +			else
> +				break;
> +		}
> +
> +		if (i >= desc_max) {
> +			vringh_bad("Chained index %u > %u", i, desc_max);
> +			err = -EINVAL;
> +			goto fail;
> +		}
> +	}
> +
> +	/* Reset for fresh iteration. */
> +	riov->i = wiov->i = 0;
> +	return 0;
> +
> +fail:
> +	if (riov->allocated)
> +		kfree(riov->iov);
> +	if (wiov->allocated)
> +		kfree(wiov->iov);
> +	return err;
> +}
> +
> +static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
> +				    int (*getu16)(u16 *val, const u16 *p),
> +				    int (*putu16)(u16 *p, u16 val),
> +				    int (*putused)(struct vring_used_elem *dst,
> +						   const struct vring_used_elem
> +						   *s),
> +				    bool *notify)
> +{
> +	struct vring_used_elem used;
> +	struct vring_used *used_ring;
> +	int err;
> +	u16 used_idx, old, used_event;
> +
> +	used.id = idx;
> +	used.len = len;
> +
> +	err = getu16(&used_idx, &vring_used_event(&vrh->vring));
> +	if (err) {
> +		vringh_bad("Failed to access used event %p",
> +			   &vring_used_event(&vrh->vring));
> +		return err;
> +	}
> +
> +	used_ring = vrh->vring.used;
> +
> +	err = putused(&used_ring->ring[used_idx % vrh->vring.num], &used);
> +	if (err) {
> +		vringh_bad("Failed to write used entry %u at %p",
> +			   used_idx % vrh->vring.num,
> +			   &used_ring->ring[used_idx % vrh->vring.num]);
> +		return err;
> +	}
> +
> +	/* Make sure buffer is written before we update index. */
> +	smp_wmb();
> +
> +	old = vrh->last_used_idx;
> +	vrh->last_used_idx++;
> +
> +	err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
> +	if (err) {
> +		vringh_bad("Failed to update used index at %p",
> +			   &vrh->vring.used->idx);
> +		return err;
> +	}
> +
> +	/* If we already know we need to notify, skip re-checking */
> +	if (*notify)
> +		return 0;
> +
> +	/* Flush out used index update. This is paired with the
> +	 * barrier that the Guest executes when enabling
> +	 * interrupts. */
> +	smp_mb();
> +
> +	/* Old-style, without event indices. */
> +	if (!vrh->event_indices) {
> +		u16 flags;
> +		err = getu16(&flags, &vrh->vring.avail->flags);
> +		if (err) {
> +			vringh_bad("Failed to get flags at %p",
> +				   &vrh->vring.avail->flags);
> +			return err;
> +		}
> +		if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
> +			*notify = true;
> +		return 0;
> +	}
> +
> +	/* Modern: we know where other side is up to. */
> +	err = getu16(&used_event, &vring_used_event(&vrh->vring));
> +	if (err) {
> +		vringh_bad("Failed to get used event idx at %p",
> +			   &vring_used_event(&vrh->vring));
> +		return err;
> +	}
> +	if (vring_need_event(used_event, vrh->last_used_idx, old))
> +		*notify = true;
> +	return 0;
> +}
> +
> +static inline bool __vringh_notify_enable(struct vringh *vrh,
> +					  int (*getu16)(u16 *val, const u16 *p),
> +					  int (*putu16)(u16 *p, u16 val))
> +{
> +	u16 avail;
> +
> +	/* Already enabled? */
> +	if (vrh->listening)
> +		return false;
> +
> +	vrh->listening = true;
> +
> +	if (!vrh->event_indices) {
> +		/* Old-school; update flags. */
> +		if (putu16(&vrh->vring.used->flags, 0) != 0) {
> +			vringh_bad("Clearing used flags %p",
> +				   &vrh->vring.used->flags);
> +			return false;
> +		}
> +	} else {
> +		if (putu16(&vring_avail_event(&vrh->vring),
> +			   vrh->last_avail_idx) != 0) {
> +			vringh_bad("Updating avail event index %p",
> +				   &vring_avail_event(&vrh->vring));
> +			return false;
> +		}
> +	}
> +
> +	/* They could have slipped one in as we were doing that: make
> +	 * sure it's written, then check again. */
> +	smp_mb();
> +
> +	if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
> +		vringh_bad("Failed to check avail idx at %p",
> +			   &vrh->vring.avail->idx);
> +		return false;
> +	}
> +
> +	/* This is so unlikely, we just leave notifications enabled. */
> +	return avail != vrh->last_avail_idx;
> +}
> +
> +static inline void __vringh_notify_disable(struct vringh *vrh,
> +					   int (*putu16)(u16 *p, u16 val))
> +{
> +	/* Already disabled? */
> +	if (!vrh->listening)
> +		return;
> +
> +	vrh->listening = false;
> +	if (!vrh->event_indices) {
> +		/* Old-school; update flags. */
> +		if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
> +			vringh_bad("Setting used flags %p",
> +				   &vrh->vring.used->flags);
> +		}
> +	}
> +}
> +
> +/* Userspace access helpers. */
> +static inline int getu16_user(u16 *val, const u16 *p)
> +{
> +	return get_user(*val, (__force u16 __user *)p);
> +}
> +
> +static inline int putu16_user(u16 *p, u16 val)
> +{
> +	return put_user(val, (__force u16 __user *)p);
> +}
> +
> +static inline int getdesc_user(struct vring_desc *dst,
> +			       const struct vring_desc *src)
> +{
> +	return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
> +		-EFAULT;
> +}
> +
> +static inline int putused_user(struct vring_used_elem *dst,
> +			       const struct vring_used_elem *s)
> +{
> +	return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
> +		? 0 : -EFAULT;
> +}
> +
> +static inline int xfer_from_user(void *src, void *dst, size_t len)
> +{
> +	return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
> +		-EFAULT;
> +}
> +
> +static inline int xfer_to_user(void *dst, void *src, size_t len)
> +{
> +	return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
> +		-EFAULT;
> +}
> +
> +/**
> + * vringh_init_user - initialize a vringh for a userspace vring.
> + * @vrh: the vringh to initialize.
> + * @features: the feature bits for this ring.
> + * @num: the number of elements.
> + * @desc: the userpace descriptor pointer.
> + * @avail: the userpace avail pointer.
> + * @used: the userpace used pointer.
> + *
> + * Returns an error if num is invalid: you should check pointers
> + * yourself!
> + */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> +		     unsigned int num,
> +		     struct vring_desc __user *desc,
> +		     struct vring_avail __user *avail,
> +		     struct vring_used __user *used)
> +{
> +	/* Sane power of 2 please! */
> +	if (!num || num > 0xffff || (num & (num - 1))) {
> +		vringh_bad("Bad ring size %zu", num);
> +		return -EINVAL;
> +	}
> +
> +	vrh->event_indices = (features & VIRTIO_RING_F_EVENT_IDX);
> +	vrh->listening = false;
> +	vrh->last_avail_idx = 0;
> +	vrh->last_used_idx = 0;
> +	vrh->vring.num = num;
> +	vrh->vring.desc = (__force struct vring_desc *)desc;
> +	vrh->vring.avail = (__force struct vring_avail *)avail;
> +	vrh->vring.used = (__force struct vring_used *)used;
> +	return 0;
> +}
> +
> +/**
> + * vringh_getdesc_user - get next available descriptor from userspace ring.
> + * @vrh: the userspace vring.
> + * @riov: where to put the readable descriptors.
> + * @wiov: where to put the writable descriptors.
> + * @getrange: function to call to check ranges.
> + * @head: head index we received, for passing to vringh_complete_user().
> + * @gfp: flags for allocating larger riov/wiov.
> + *
> + * Returns 0 if there was no descriptor, 1 if there was, or -errno.
> + *
> + * If it returns 1, riov->allocated and wiov->allocated indicate if you
> + * have to kfree riov->iov and wiov->iov respectively.
> + */
> +int vringh_getdesc_user(struct vringh *vrh,
> +			struct vringh_iov *riov,
> +			struct vringh_iov *wiov,
> +			bool (*getrange)(u64 addr, struct vringh_range *r),
> +			u16 *head,
> +			gfp_t gfp)
> +{
> +	int err;
> +
> +	err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
> +	if (err < 0)
> +		return err;
> +
> +	/* Empty... */
> +	if (err == vrh->vring.num)
> +		return 0;
> +
> +	*head = err;
> +	err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
> +	if (err)
> +		return err;
> +
> +	return 1;
> +}
> +
> +/**
> + * vringh_iov_pull_user - copy bytes from vring_iov.
> + * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
> +{
> +	return vringh_iov_xfer(riov, dst, len, xfer_from_user);
> +}
> +
> +/**
> + * vringh_iov_push_user - copy bytes into vring_iov.
> + * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> +			     const void *src, size_t len)
> +{
> +	return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
> +}
> +
> +/**
> + * vringh_abandon_user - we've decided not to handle the descriptor(s).
> + * @vrh: the vring.
> + * @num: the number of descriptors to put back (ie. num
> + *	 vringh_get_user() to undo).
> + *
> + * The next vringh_get_user() will return the old descriptor(s) again.
> + */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num)
> +{
> +	/* We only update vring_avail_event(vr) when we want to be notified,
> +	 * so we haven't changed that yet. */
> +	vrh->last_avail_idx -= num;
> +}
> +
> +/**
> + * vringh_complete_user - we've finished with descriptor, publish it.
> + * @vrh: the vring.
> + * @head: the head as filled in by vringh_getdesc_user.
> + * @len: the length of data we have written.
> + * @notify: set if we should notify the other side, otherwise left alone.
> + */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> +			 bool *notify)
> +{
> +	return __vringh_complete(vrh, head, len,
> +				 getu16_user, putu16_user, putused_user,
> +				 notify);
> +}
> +
> +/**
> + * vringh_notify_enable_user - we want to know if something changes.
> + * @vrh: the vring.
> + *
> + * This always enables notifications, but returns true if there are
> + * now more buffers available in the vring.
> + */
> +bool vringh_notify_enable_user(struct vringh *vrh)
> +{
> +	return __vringh_notify_enable(vrh, getu16_user, putu16_user);
> +}
> +
> +/**
> + * vringh_notify_disable_user - don't tell us if something changes.
> + * @vrh: the vring.
> + *
> + * This is our normal running state: we disable and then only enable when
> + * we're going to sleep.
> + */
> +void vringh_notify_disable_user(struct vringh *vrh)
> +{
> +	__vringh_notify_disable(vrh, putu16_user);
> +}
> diff --git a/include/linux/virtio_host.h b/include/linux/virtio_host.h
> new file mode 100644
> index 0000000..07bb4f6
> --- /dev/null
> +++ b/include/linux/virtio_host.h
> @@ -0,0 +1,88 @@
> +/*
> + * Linux host-side vring helpers; for when the kernel needs to access
> + * someone else's vring.
> + *
> + * Copyright IBM Corporation, 2013.
> + * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Written by: Rusty Russell <rusty@rustcorp.com.au>
> + */
> +#ifndef _LINUX_VIRTIO_HOST_H
> +#define _LINUX_VIRTIO_HOST_H
> +#include <uapi/linux/virtio_ring.h>
> +#include <uapi/linux/uio.h>
> +
> +/* virtio_ring with information needed for host access. */
> +struct vringh {
> +	/* Guest publishes used event idx (note: we always do). */
> +	bool event_indices;
> +
> +	/* Have we told the other end we want to be notified? */
> +	bool listening;
> +
> +	/* Last available index we saw (ie. where we're up to). */
> +	u16 last_avail_idx;
> +
> +	/* Last index we used. */
> +	u16 last_used_idx;
> +
> +	/* The vring (note: it may contain user pointers!) */
> +	struct vring vring;
> +};
> +
> +/* The memory the vring can access, and what offset to apply. */
> +struct vringh_range {
> +	u64 start, end_incl;
> +	u64 offset;
> +};
> +
> +/* All the information about an iovec. */
> +struct vringh_iov {
> +	struct iovec *iov;
> +	unsigned i, max;
> +	bool allocated;

MAybe set iov = NULL when not allocated?

> +};
> +
> +/* Helpers for userspace vrings. */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> +		     unsigned int num,
> +		     struct vring_desc __user *desc,
> +		     struct vring_avail __user *avail,
> +		     struct vring_used __user *used);
> +
> +/* Convert a descriptor into iovecs. */
> +int vringh_getdesc_user(struct vringh *vrh,
> +			struct vringh_iov *riov,
> +			struct vringh_iov *wiov,
> +			bool (*getrange)(u64 addr, struct vringh_range *r),
> +			u16 *head,
> +			gfp_t gfp);
> +
> +/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
> +
> +/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> +			     const void *src, size_t len);
> +
> +/* Mark a descriptor as used.  Sets notify if you should fire eventfd. */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> +			 bool *notify);
> +
> +/* Pretend we've never seen descriptor (for easy error handling). */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num);
> +#endif /* _LINUX_VIRTIO_HOST_H */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-14 17:39                         ` Michael S. Tsirkin
@ 2013-01-16  3:13                           ` Rusty Russell
  2013-01-16  8:16                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Rusty Russell @ 2013-01-16  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

"Michael S. Tsirkin" <mst@redhat.com> writes:
>> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
>> +{
>> +	struct iovec *new;
>> +	unsigned int new_num = iov->max * 2;
>
> We must limit this I think, this is coming
> from userspace. How about UIO_MAXIOV?

We limit it to the ring size already; UIO_MAXIOV is a weird choice here.

>> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
>> +				       struct vring_desc **descs, int *desc_max)
>
> Not sure it should be cold like that - virtio net uses indirect on data
> path.

This is only when we have a chained, indirect descriptor (ie. we have to
go back up to the next entry in the main descriptor table).  That's
allowed in the spec, but noone does it.
>> +		/* Make sure it's OK, and get offset. */
>> +		if (!check_range(desc.addr, desc.len, &range, getrange)) {
>> +			err = -EINVAL;
>> +			goto fail;
>> +		}
>
> Hmm this looks like it will translate and
> validate immediate descriptors same way as indirect ones.
> vhost-net has different translation for regular descriptors
> and indirect ones, both for speed and to allow ring aliasing,
> so it has to know which is which.

I see translate_desc() in both cases, what's different?

>> +		addr = (void *)(long)desc.addr + range.offset;
>
> I really dislike raw pointers that we must never dereference.
> Since we are forcing everything to __user anyway, why don't we
> tag all addresses as __user? The kernel users of this API
> can cast that away, this will keep the casts to minimum.
>
> Failing that, we can add our own class
> # define __virtio         __attribute__((noderef, address_space(2)))

In this case, perhaps we should leave addr as a u64?

>> +		iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> +		iov->iov[iov->i].iov_len = desc.len;
>> +		iov->i++;
>
>
> This looks like it won't do the right thing if desc.len spans multiple
> ranges. I don't know if this happens in practice but this is something
> vhost supports ATM.

Well, kind of.  I assumed that the bool (*getrange)(u64, struct
vringh_range *)) callback would meld any adjacent ranges if it needs to.

>> +/* All the information about an iovec. */
>> +struct vringh_iov {
>> +	struct iovec *iov;
>> +	unsigned i, max;
>> +	bool allocated;
>
> MAybe set iov = NULL when not allocated?

The idea was that iov points to the initial (on-stack?) iov, for the
fast path.

I'm writing a more complete test at the moment, then I will look at how
this fits with vhost.c as it stands...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-16  3:13                           ` Rusty Russell
@ 2013-01-16  8:16                             ` Michael S. Tsirkin
  2013-01-17  2:10                               ` Rusty Russell
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-01-16  8:16 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> >> +{
> >> +	struct iovec *new;
> >> +	unsigned int new_num = iov->max * 2;
> >
> > We must limit this I think, this is coming
> > from userspace. How about UIO_MAXIOV?
> 
> We limit it to the ring size already;

1. do we limit it in case there's a loop in the descriptor ring?
2. do we limit it in case there are indirect descriptors?
I guess I missed where we do this could you point this out to me?

> UIO_MAXIOV is a weird choice here.

It's kind of forced by the need to pass the iov on to the linux kernel,
so we know that any guest using more is broken on existing hypervisors.

Ring size is somewhat arbitrary too, isn't it?  A huge ring where we
post lots of short descriptors (e.g. RX buffers) seems like a valid thing to do.

> >> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
> >> +				       struct vring_desc **descs, int *desc_max)
> >
> > Not sure it should be cold like that - virtio net uses indirect on data
> > path.
> 
> This is only when we have a chained, indirect descriptor (ie. we have to
> go back up to the next entry in the main descriptor table).  That's
> allowed in the spec, but noone does it.
> >> +		/* Make sure it's OK, and get offset. */
> >> +		if (!check_range(desc.addr, desc.len, &range, getrange)) {
> >> +			err = -EINVAL;
> >> +			goto fail;
> >> +		}
> >
> > Hmm this looks like it will translate and
> > validate immediate descriptors same way as indirect ones.
> > vhost-net has different translation for regular descriptors
> > and indirect ones, both for speed and to allow ring aliasing,
> > so it has to know which is which.
> 
> I see translate_desc() in both cases, what's different?
> >> +		addr = (void *)(long)desc.addr + range.offset;
> >
> > I really dislike raw pointers that we must never dereference.
> > Since we are forcing everything to __user anyway, why don't we
> > tag all addresses as __user? The kernel users of this API
> > can cast that away, this will keep the casts to minimum.
> >
> > Failing that, we can add our own class
> > # define __virtio         __attribute__((noderef, address_space(2)))
> 
> In this case, perhaps we should leave addr as a u64?

Point being? All users will cast to a pointer.
It seems at first passing in raw pointers is cleaner,
but it turns out in the API we are passing iovs around,
and they are __user anyway.
So using raw pointers here does not buy us anything,
so let's use __user and gain extra static checks at no cost.


> >> +		iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >> +		iov->iov[iov->i].iov_len = desc.len;
> >> +		iov->i++;
> >
> >
> > This looks like it won't do the right thing if desc.len spans multiple
> > ranges. I don't know if this happens in practice but this is something
> > vhost supports ATM.
> 
> Well, kind of.  I assumed that the bool (*getrange)(u64, struct
> vringh_range *)) callback would meld any adjacent ranges if it needs to.

Confused. If addresses 0 to 0x1000 map to virtual addresses 0 to 0x1000
and 0x1000 to 0x2000 map to virtual addresses 0x2000 to 0x3000, then
a single descriptor covering 0 to 0x2000 in guest needs two
iov entries. What can getrange do about it?


> >> +/* All the information about an iovec. */
> >> +struct vringh_iov {
> >> +	struct iovec *iov;
> >> +	unsigned i, max;
> >> +	bool allocated;
> >
> > MAybe set iov = NULL when not allocated?
> 
> The idea was that iov points to the initial (on-stack?) iov, for the
> fast path.
> 
> I'm writing a more complete test at the moment, then I will look at how
> this fits with vhost.c as it stands...
> 
> Cheers,
> Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-16  8:16                             ` Michael S. Tsirkin
@ 2013-01-17  2:10                               ` Rusty Russell
  2013-01-17  9:58                                 ` Michael S. Tsirkin
  2013-01-17 10:35                                 ` Rusty Russell
  0 siblings, 2 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-17  2:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
>> >> +{
>> >> +	struct iovec *new;
>> >> +	unsigned int new_num = iov->max * 2;
>> >
>> > We must limit this I think, this is coming
>> > from userspace. How about UIO_MAXIOV?
>> 
>> We limit it to the ring size already;
>
> 1. do we limit it in case there's a loop in the descriptor ring?

Yes, we catch loops as per normal (simple counter):

		if (count++ == vrh->vring.num) {
			vringh_bad("Descriptor loop in %p", descs);
			err = -ELOOP;
			goto fail;
		}

> 2. do we limit it in case there are indirect descriptors?
> I guess I missed where we do this could you point this out to me?

Well, the total is limited above, indirect descriptors or no (since we
handle them inline).  Because each indirect descriptor must contain one
descriptor (we always grab descriptor 0), the loop must terminate.

>> UIO_MAXIOV is a weird choice here.
>
> It's kind of forced by the need to pass the iov on to the linux kernel,
> so we know that any guest using more is broken on existing hypervisors.
>
> Ring size is somewhat arbitrary too, isn't it?  A huge ring where we
> post lots of short descriptors (e.g. RX buffers) seems like a valid thing to do.

Sure, but the ring size is a documented limit (even if indirect
descriptors are used).  I hadn't realized we have an
implementation-specific limit of 1024 descriptors: I shall add this.
While noone reasonable will exceed that, we should document it somewhere
in the spec.

>> > I really dislike raw pointers that we must never dereference.
>> > Since we are forcing everything to __user anyway, why don't we
>> > tag all addresses as __user? The kernel users of this API
>> > can cast that away, this will keep the casts to minimum.
>> >
>> > Failing that, we can add our own class
>> > # define __virtio         __attribute__((noderef, address_space(2)))
>> 
>> In this case, perhaps we should leave addr as a u64?
>
> Point being? All users will cast to a pointer.
> It seems at first passing in raw pointers is cleaner,
> but it turns out in the API we are passing iovs around,
> and they are __user anyway.
> So using raw pointers here does not buy us anything,
> so let's use __user and gain extra static checks at no cost.

I resist sprinkling __user everywhere because it's *not* always user
addresses, and it's deeply misleading to anyone reading it.  I'd rather
have it in one place with a big comment.

I can try using a union of kvec and iovec, since they are the same
layout in practice AFAICT.

>> >> +		iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> >> +		iov->iov[iov->i].iov_len = desc.len;
>> >> +		iov->i++;
>> >
>> >
>> > This looks like it won't do the right thing if desc.len spans multiple
>> > ranges. I don't know if this happens in practice but this is something
>> > vhost supports ATM.
>> 
>> Well, kind of.  I assumed that the bool (*getrange)(u64, struct
>> vringh_range *)) callback would meld any adjacent ranges if it needs to.
>
> Confused. If addresses 0 to 0x1000 map to virtual addresses 0 to 0x1000
> and 0x1000 to 0x2000 map to virtual addresses 0x2000 to 0x3000, then
> a single descriptor covering 0 to 0x2000 in guest needs two
> iov entries. What can getrange do about it?

getrange doesn't map virtual to physical, it maps virtual to user.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-17  2:10                               ` Rusty Russell
@ 2013-01-17  9:58                                 ` Michael S. Tsirkin
  2013-01-21 11:55                                   ` Rusty Russell
  2013-01-17 10:35                                 ` Rusty Russell
  1 sibling, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2013-01-17  9:58 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

On Thu, Jan 17, 2013 at 12:40:29PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> >> >> +{
> >> >> +	struct iovec *new;
> >> >> +	unsigned int new_num = iov->max * 2;
> >> >
> >> > We must limit this I think, this is coming
> >> > from userspace. How about UIO_MAXIOV?
> >> 
> >> We limit it to the ring size already;
> >
> > 1. do we limit it in case there's a loop in the descriptor ring?
> 
> Yes, we catch loops as per normal (simple counter):
> 
> 		if (count++ == vrh->vring.num) {
> 			vringh_bad("Descriptor loop in %p", descs);
> 			err = -ELOOP;
> 			goto fail;
> 		}
> 
> > 2. do we limit it in case there are indirect descriptors?
> > I guess I missed where we do this could you point this out to me?
> 
> Well, the total is limited above, indirect descriptors or no (since we
> handle them inline).  Because each indirect descriptor must contain one
> descriptor (we always grab descriptor 0), the loop must terminate.
> 
> >> UIO_MAXIOV is a weird choice here.
> >
> > It's kind of forced by the need to pass the iov on to the linux kernel,
> > so we know that any guest using more is broken on existing hypervisors.
> >
> > Ring size is somewhat arbitrary too, isn't it?  A huge ring where we
> > post lots of short descriptors (e.g. RX buffers) seems like a valid thing to do.
> 
> Sure, but the ring size is a documented limit (even if indirect
> descriptors are used).  I hadn't realized we have an
> implementation-specific limit of 1024 descriptors: I shall add this.
> While noone reasonable will exceed that, we should document it somewhere
> in the spec.
> 
> >> > I really dislike raw pointers that we must never dereference.
> >> > Since we are forcing everything to __user anyway, why don't we
> >> > tag all addresses as __user? The kernel users of this API
> >> > can cast that away, this will keep the casts to minimum.
> >> >
> >> > Failing that, we can add our own class
> >> > # define __virtio         __attribute__((noderef, address_space(2)))
> >> 
> >> In this case, perhaps we should leave addr as a u64?
> >
> > Point being? All users will cast to a pointer.
> > It seems at first passing in raw pointers is cleaner,
> > but it turns out in the API we are passing iovs around,
> > and they are __user anyway.
> > So using raw pointers here does not buy us anything,
> > so let's use __user and gain extra static checks at no cost.
> 
> I resist sprinkling __user everywhere because it's *not* always user
> addresses, and it's deeply misleading to anyone reading it.  I'd rather
> have it in one place with a big comment.
> I can try using a union of kvec and iovec, since they are the same
> layout in practice AFAICT.

I suggest the following easy fix: as you say, it's
in one place with a bug comment.

/* On the host side we often communicate to untrusted
 * entities over virtio, so set __user tag on addresses
 * we get helps make sure we don't directly dereference the addresses,
 * while making it possible to pass the addresses in iovec arrays
 * without casts.
 */
#define __virtio __user

/* A helper to discard __virtio tag - only call when
 * you are communicating to a trusted entity.
 */
static inline void *virtio_raw_addr(__virtio void *addr)
{
	return (__force void *)addr;
}

Hmm?

> 
> >> >> +		iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >> >> +		iov->iov[iov->i].iov_len = desc.len;
> >> >> +		iov->i++;
> >> >
> >> >
> >> > This looks like it won't do the right thing if desc.len spans multiple
> >> > ranges. I don't know if this happens in practice but this is something
> >> > vhost supports ATM.
> >> 
> >> Well, kind of.  I assumed that the bool (*getrange)(u64, struct
> >> vringh_range *)) callback would meld any adjacent ranges if it needs to.
> >
> > Confused. If addresses 0 to 0x1000 map to virtual addresses 0 to 0x1000
> > and 0x1000 to 0x2000 map to virtual addresses 0x2000 to 0x3000, then
> > a single descriptor covering 0 to 0x2000 in guest needs two
> > iov entries. What can getrange do about it?
> 
> getrange doesn't map virtual to physical, it maps virtual to user.
> 
> Cheers,
> Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-17  2:10                               ` Rusty Russell
  2013-01-17  9:58                                 ` Michael S. Tsirkin
@ 2013-01-17 10:35                                 ` Rusty Russell
  1 sibling, 0 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-17 10:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

Rusty Russell <rusty@rustcorp.com.au> writes:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
>>> >> +{
>>> >> +	struct iovec *new;
>>> >> +	unsigned int new_num = iov->max * 2;
>>> >
>>> > We must limit this I think, this is coming
>>> > from userspace. How about UIO_MAXIOV?
>>> 
>>> We limit it to the ring size already;
>>
>> 1. do we limit it in case there's a loop in the descriptor ring?

I didn't get a chance to do these revisions, as I spent today debugging
the test framework.  I won't get any more work on it until next week, so
I've posted a rough series anyway for feedback (can also be found
in my pending-rebases branch on kernel.org).

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
  2013-01-17  9:58                                 ` Michael S. Tsirkin
@ 2013-01-21 11:55                                   ` Rusty Russell
  0 siblings, 0 replies; 23+ messages in thread
From: Rusty Russell @ 2013-01-21 11:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sjur Brændeland, Linus Walleij, virtualization, LKML,
	Sjur Brændeland, Ohad Ben-Cohen

"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Jan 17, 2013 at 12:40:29PM +1030, Rusty Russell wrote:
>> I resist sprinkling __user everywhere because it's *not* always user
>> addresses, and it's deeply misleading to anyone reading it.  I'd rather
>> have it in one place with a big comment.
>> I can try using a union of kvec and iovec, since they are the same
>> layout in practice AFAICT.
>
> I suggest the following easy fix: as you say, it's
> in one place with a bug comment.
>
> /* On the host side we often communicate to untrusted
>  * entities over virtio, so set __user tag on addresses
>  * we get helps make sure we don't directly dereference the addresses,
>  * while making it possible to pass the addresses in iovec arrays
>  * without casts.
>  */
> #define __virtio __user
>
> /* A helper to discard __virtio tag - only call when
>  * you are communicating to a trusted entity.
>  */
> static inline void *virtio_raw_addr(__virtio void *addr)
> {
> 	return (__force void *)addr;
> }
>
> Hmm?

The two problems are iovec, which contains a __user address, and that
gets exposed via the API, and vring, which *doesn't* use __user
addresses.

This is ugly, but works:

vringh: use vringh_kiov for _kern functions, and internally.

This makes user of vringh perfectly __user-clean, and removes an
internal cast.

This only works because of -fno-strict-aliasing!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index ab10da8..2ba087d 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -65,9 +65,9 @@ static inline int __vringh_get_head(const struct vringh *vrh,
 }
 
 /* Copy some bytes to/from the iovec.  Returns num copied. */
-static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
+static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
 				      void *ptr, size_t len,
-				      int (*xfer)(void __user *addr, void *ptr,
+				      int (*xfer)(void *addr, void *ptr,
 						  size_t len))
 {
 	int err, done = 0;
@@ -149,9 +149,9 @@ static int move_to_indirect(int *up_next, u16 *i, void *addr,
 	return 0;
 }
 
-static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
+static int resize_iovec(struct vringh_kiov *iov, gfp_t gfp)
 {
-	struct iovec *new;
+	struct kvec *new;
 	unsigned int new_num = iov->max * 2;
 
 	if (new_num < 8)
@@ -186,8 +186,8 @@ static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
 
 static inline int
 __vringh_iov(struct vringh *vrh, u16 i,
-	     struct vringh_iov *riov,
-	     struct vringh_iov *wiov,
+	     struct vringh_kiov *riov,
+	     struct vringh_kiov *wiov,
 	     bool (*getrange)(u64 addr, struct vringh_range *r),
 	     gfp_t gfp,
 	     int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
@@ -204,7 +204,7 @@ __vringh_iov(struct vringh *vrh, u16 i,
 	riov->i = wiov->i = 0;
 	for (;;) {
 		void *addr;
-		struct vringh_iov *iov;
+		struct vringh_kiov *iov;
 
 		err = getdesc(&desc, &descs[i]);
 		if (unlikely(err))
@@ -249,7 +249,7 @@ __vringh_iov(struct vringh *vrh, u16 i,
 				goto fail;
 		}
 
-		iov->iov[iov->i].iov_base = (__force __user void *)addr;
+		iov->iov[iov->i].iov_base = addr;
 		iov->iov[iov->i].iov_len = desc.len;
 		iov->i++;
 
@@ -438,7 +438,7 @@ static inline void __vringh_notify_disable(struct vringh *vrh,
 	}
 }
 
-/* Userspace access helpers. */
+/* Userspace access helpers: in this case, addresses are really userspace. */
 static inline int getu16_user(u16 *val, const u16 *p)
 {
 	return get_user(*val, (__force u16 __user *)p);
@@ -452,27 +452,27 @@ static inline int putu16_user(u16 *p, u16 val)
 static inline int getdesc_user(struct vring_desc *dst,
 			       const struct vring_desc *src)
 {
-	return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
-		-EFAULT;
+	return copy_from_user(dst, (__force void __user *)src, sizeof(*dst)) ?
+		-EFAULT : 0;
 }
 
 static inline int putused_user(struct vring_used_elem *dst,
 			       const struct vring_used_elem *s)
 {
-	return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
-		? 0 : -EFAULT;
+	return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) ?
+		-EFAULT : 0;
 }
 
 static inline int xfer_from_user(void *src, void *dst, size_t len)
 {
-	return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
-		-EFAULT;
+	return copy_from_user(dst, (__force void __user *)src, len) ?
+		-EFAULT : 0;
 }
 
 static inline int xfer_to_user(void *dst, void *src, size_t len)
 {
-	return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
-		-EFAULT;
+	return copy_to_user((__force void __user *)dst, src, len) ?
+		-EFAULT : 0;
 }
 
 /**
@@ -506,6 +506,7 @@ int vringh_init_user(struct vringh *vrh, u32 features,
 	vrh->last_avail_idx = 0;
 	vrh->last_used_idx = 0;
 	vrh->vring.num = num;
+	/* vring expects kernel addresses, but only used via accessors. */
 	vrh->vring.desc = (__force struct vring_desc *)desc;
 	vrh->vring.avail = (__force struct vring_avail *)avail;
 	vrh->vring.used = (__force struct vring_used *)used;
@@ -543,8 +544,30 @@ int vringh_getdesc_user(struct vringh *vrh,
 	if (err == vrh->vring.num)
 		return 0;
 
+	/* We need the layouts to be the indentical for this to work */
+	BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
+		     offsetof(struct vringh_iov, iov));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
+		     offsetof(struct vringh_iov, i));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, max) !=
+		     offsetof(struct vringh_iov, max));
+	BUILD_BUG_ON(offsetof(struct vringh_kiov, allocated) !=
+		     offsetof(struct vringh_iov, allocated));
+	BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
+	BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
+		     offsetof(struct kvec, iov_base));
+	BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
+		     offsetof(struct kvec, iov_len));
+	BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
+		     != sizeof(((struct kvec *)NULL)->iov_base));
+	BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
+		     != sizeof(((struct kvec *)NULL)->iov_len));
+
 	*head = err;
-	err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
+	err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
+			   (struct vringh_kiov *)wiov,
+			   getrange, gfp, getdesc_user);
 	if (err)
 		return err;
 
@@ -561,7 +584,8 @@ int vringh_getdesc_user(struct vringh *vrh,
  */
 ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
 {
-	return vringh_iov_xfer(riov, dst, len, xfer_from_user);
+	return vringh_iov_xfer((struct vringh_kiov *)riov,
+			       dst, len, xfer_from_user);
 }
 
 /**
@@ -575,7 +599,8 @@ ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
 ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
 			     const void *src, size_t len)
 {
-	return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
+	return vringh_iov_xfer((struct vringh_kiov *)wiov,
+			       (void *)src, len, xfer_to_user);
 }
 
 /**
@@ -734,8 +759,8 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
  * have to kfree riov->iov and wiov->iov respectively.
  */
 int vringh_getdesc_kern(struct vringh *vrh,
-			struct vringh_iov *riov,
-			struct vringh_iov *wiov,
+			struct vringh_kiov *riov,
+			struct vringh_kiov *wiov,
 			u16 *head,
 			gfp_t gfp)
 {
@@ -766,7 +791,7 @@ int vringh_getdesc_kern(struct vringh *vrh,
  *
  * Returns the bytes copied <= len or a negative errno.
  */
-ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len)
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len)
 {
 	return vringh_iov_xfer(riov, dst, len, xfer_kern);
 }
@@ -779,7 +804,7 @@ ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len)
  *
  * Returns the bytes copied <= len or a negative errno.
  */
-ssize_t vringh_iov_push_kern(struct vringh_iov *wiov,
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
 			     const void *src, size_t len)
 {
 	return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9df86e9..b3345e9 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -24,7 +24,7 @@
 #ifndef _LINUX_VRINGH_H
 #define _LINUX_VRINGH_H
 #include <uapi/linux/virtio_ring.h>
-#include <uapi/linux/uio.h>
+#include <linux/uio.h>
 #include <asm/barrier.h>
 
 /* virtio_ring with information needed for host access. */
@@ -64,6 +64,13 @@ struct vringh_iov {
 	bool allocated;
 };
 
+/* All the information about a kvec. */
+struct vringh_kiov {
+	struct kvec *iov;
+	unsigned i, max;
+	bool allocated;
+};
+
 /* Helpers for userspace vrings. */
 int vringh_init_user(struct vringh *vrh, u32 features,
 		     unsigned int num, bool weak_barriers,
@@ -103,13 +110,13 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
 		     struct vring_used *used);
 
 int vringh_getdesc_kern(struct vringh *vrh,
-			struct vringh_iov *riov,
-			struct vringh_iov *wiov,
+			struct vringh_kiov *riov,
+			struct vringh_kiov *wiov,
 			u16 *head,
 			gfp_t gfp);
 
-ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len);
-ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len);
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
 			     const void *src, size_t len);
 void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
 int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-01-21 12:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio Sjur Brændeland
2012-11-01  7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
2012-11-05 12:12   ` Sjur Brændeland
2012-11-06  2:09     ` Rusty Russell
     [not found]       ` <1354718230-4486-1-git-send-email-sjur@brendeland.net>
     [not found]         ` <20121206102750.GF10837@redhat.com>
     [not found]           ` <877goc0wac.fsf@rustcorp.com.au>
     [not found]             ` <CAJK669bP41oBhJ=MB64NS21Ag7XO5WswuTiVKCFTb96nvmyBiw@mail.gmail.com>
     [not found]               ` <87pq1f2rj0.fsf@rustcorp.com.au>
2013-01-10 10:30                 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Rusty Russell
2013-01-10 11:11                   ` Michael S. Tsirkin
2013-01-10 18:39                   ` Sjur Brændeland
2013-01-10 23:35                     ` Rusty Russell
2013-01-11  6:37                       ` Rusty Russell
2013-01-11 15:02                         ` Sjur Brændeland
2013-01-12  0:26                           ` Rusty Russell
2013-01-14 17:39                         ` Michael S. Tsirkin
2013-01-16  3:13                           ` Rusty Russell
2013-01-16  8:16                             ` Michael S. Tsirkin
2013-01-17  2:10                               ` Rusty Russell
2013-01-17  9:58                                 ` Michael S. Tsirkin
2013-01-21 11:55                                   ` Rusty Russell
2013-01-17 10:35                                 ` Rusty Russell
2013-01-11 14:52                       ` Sjur Brændeland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).