* [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
@ 2012-10-31 22:46 Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
` (4 more replies)
0 siblings, 5 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization, sjur
This patch-set introduces the CAIF Virtio Link layer. The purpose is to
communicate with a remote processor (a modem) over shared memory. Virtio
is used as the transport mechanism, and the Remoteproc framework provides
configuration and management of the Virtio rings and devices. The modem
and Linux host may be on the same SoC, or connected over a shared memory
interface such as LLI.
Zero-Copy data transport on the modem is primary goal for CAIF Virtio.
In order to achieve Zero-Copy the direction of the Virtio rings are
flipped in the RX direction. So we have implemented the Virtio
access-function similar to what is found in vhost.c.
The connected LTE-modem is a multi-processor system with an advanced
memory allocator on board. In order to provide zero-copy from the modem
to the connected Linux host, the direction of the Virtio rings are
reversed. This allows the modem to allocate data-buffers in RX
direction and pass them to the Linux host, and recycled buffers to be
sent back to the modem.
The option of providing pre-allocated buffers in RX direction has been
considered, but rejected. The allocation of data buffers happens deep
down in the network signaling stack on the LTE modem before it is known
what type of data is received. It may be data that is handled within the
modem and never sent to the Linux host, or IP traffic going to the host.
Pre-allocated Virtio buffers does not fit for this usage. Another issue
is that the network traffic pattern may vary, resulting in variation of
number and size of buffers allocated from the memory allocator. Dynamic
allocation is needed in order to utilize memory properly. Due to this,
we decided we had to implement "reversed" vrings. Reversed vrings allows
us to minimize the impact on the current memory allocator and buffer
handling on the modem.
In order to implement reversed rings we have added functions for reading
descriptors from the available-ring and adding descriptors to the used-ring.
The internal data-structures in virtio_ring.c are moved into a new header
file so the data-structures can be accessed by caif_virtio.
The data buffers in TX direction are allocated using dma_alloc_coherent().
This allows memory to be allocated from the memory region shared between
the Host and modem.
In TX direction single linearized TX buffers are added to the vring. In
RX direction linearized frames are also used, but multiple descriptors may
be linked. This is done to allow maximum efficiency for the LTE modem.
This patch set is not yet fully tested and does not handle all negative
scenarios correctly. So at this stage we're primarily looking for review
comments related to the structure of the Virtio code. There are several
options on how to structure this, and feedback is welcomed.
Thanks,
Sjur
Sjur Brændeland (4):
virtio: Move definitions to header file vring.h
include/vring.h: Add support for reversed vritio rings.
virtio_ring: Call callback function even when used ring is empty
caif_virtio: Add CAIF over virtio
drivers/net/caif/Kconfig | 9 +
drivers/net/caif/Makefile | 3 +
drivers/net/caif/caif_virtio.c | 627 ++++++++++++++++++++++++++++++++
drivers/remoteproc/remoteproc_virtio.c | 2 +-
drivers/virtio/virtio_ring.c | 102 +-----
drivers/virtio/vring.h | 124 +++++++
include/linux/virtio_ring.h | 8 +-
include/uapi/linux/virtio_ids.h | 1 +
8 files changed, 776 insertions(+), 100 deletions(-)
create mode 100644 drivers/net/caif/caif_virtio.c
create mode 100644 drivers/virtio/vring.h
--
1.7.9.5
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings Sjur Brændeland
` (3 subsequent siblings)
4 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization, sjur, Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Move the vring_virtqueue structure, memory barrier and debug
macros out from virtio_ring.c to the new header file vring.h.
This is done in order to allow other kernel modules to access the
virtio internal data-structures.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
Tis patch triggers a couple of checkpatch warnings, but I've
chosen to do a clean copy and not do any corrections.
drivers/virtio/virtio_ring.c | 96 +--------------------------------
drivers/virtio/vring.h | 121 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 122 insertions(+), 95 deletions(-)
create mode 100644 drivers/virtio/vring.h
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..9027af6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -23,101 +23,7 @@
#include <linux/slab.h>
#include <linux/module.h>
#include <linux/hrtimer.h>
-
-/* virtio guest is communicating with a virtual "device" that actually runs on
- * a host processor. Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
-/* Where possible, use SMP barriers which are more lightweight than mandatory
- * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio-pci does not use). */
-#define virtio_mb(vq) \
- do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
-#define virtio_rmb(vq) \
- do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
-#define virtio_wmb(vq) \
- do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
-#else
-/* We must force memory ordering even if guest is UP since host could be
- * running on another CPU, but SMP barriers are defined to barrier() in that
- * configuration. So fall back to mandatory barriers instead. */
-#define virtio_mb(vq) mb()
-#define virtio_rmb(vq) rmb()
-#define virtio_wmb(vq) wmb()
-#endif
-
-#ifdef DEBUG
-/* For development, we want to crash whenever the ring is screwed. */
-#define BAD_RING(_vq, fmt, args...) \
- do { \
- dev_err(&(_vq)->vq.vdev->dev, \
- "%s:"fmt, (_vq)->vq.name, ##args); \
- BUG(); \
- } while (0)
-/* Caller is supposed to guarantee no reentry. */
-#define START_USE(_vq) \
- do { \
- if ((_vq)->in_use) \
- panic("%s:in_use = %i\n", \
- (_vq)->vq.name, (_vq)->in_use); \
- (_vq)->in_use = __LINE__; \
- } while (0)
-#define END_USE(_vq) \
- do { BUG_ON(!(_vq)->in_use); (_vq)->in_use = 0; } while(0)
-#else
-#define BAD_RING(_vq, fmt, args...) \
- do { \
- dev_err(&_vq->vq.vdev->dev, \
- "%s:"fmt, (_vq)->vq.name, ##args); \
- (_vq)->broken = true; \
- } while (0)
-#define START_USE(vq)
-#define END_USE(vq)
-#endif
-
-struct vring_virtqueue
-{
- struct virtqueue vq;
-
- /* Actual memory layout for this queue */
- struct vring vring;
-
- /* Can we use weak barriers? */
- bool weak_barriers;
-
- /* Other side has made a mess, don't try any more. */
- bool broken;
-
- /* Host supports indirect buffers */
- bool indirect;
-
- /* Host publishes avail event idx */
- bool event;
-
- /* Head of free buffer list. */
- unsigned int free_head;
- /* Number we've added since last sync. */
- unsigned int num_added;
-
- /* Last used index we've seen. */
- u16 last_used_idx;
-
- /* How to notify other side. FIXME: commonalize hcalls! */
- void (*notify)(struct virtqueue *vq);
-
-#ifdef DEBUG
- /* They're supposed to lock for us. */
- unsigned int in_use;
-
- /* Figure out if their kicks are too delayed. */
- bool last_add_time_valid;
- ktime_t last_add_time;
-#endif
-
- /* Tokens for callbacks. */
- void *data[];
-};
-
-#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+#include "vring.h"
/* Set up an indirect table of descriptors and add it to the queue. */
static int vring_add_indirect(struct vring_virtqueue *vq,
diff --git a/drivers/virtio/vring.h b/drivers/virtio/vring.h
new file mode 100644
index 0000000..b997fc3
--- /dev/null
+++ b/drivers/virtio/vring.h
@@ -0,0 +1,121 @@
+/* Virtio ring implementation.
+ *
+ * Copyright 2007 Rusty Russell IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef _LINUX_VIRTIO_RING_H_
+#define _LINUX_VIRTIO_RING_H_
+
+#include <linux/virtio_ring.h>
+#include <linux/virtio.h>
+
+struct vring_virtqueue
+{
+ struct virtqueue vq;
+
+ /* Actual memory layout for this queue */
+ struct vring vring;
+
+ /* Can we use weak barriers? */
+ bool weak_barriers;
+
+ /* Other side has made a mess, don't try any more. */
+ bool broken;
+
+ /* Host supports indirect buffers */
+ bool indirect;
+
+ /* Host publishes avail event idx */
+ bool event;
+
+ /* Number of free buffers */
+ unsigned int num_free;
+ /* Head of free buffer list. */
+ unsigned int free_head;
+ /* Number we've added since last sync. */
+ unsigned int num_added;
+
+ /* Last used index we've seen. */
+ u16 last_used_idx;
+
+ /* How to notify other side. FIXME: commonalize hcalls! */
+ void (*notify)(struct virtqueue *vq);
+
+#ifdef DEBUG
+ /* They're supposed to lock for us. */
+ unsigned int in_use;
+
+ /* Figure out if their kicks are too delayed. */
+ bool last_add_time_valid;
+ ktime_t last_add_time;
+#endif
+
+ /* Tokens for callbacks. */
+ void *data[];
+};
+#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+
+/* virtio guest is communicating with a virtual "device" that actually runs on
+ * a host processor. Memory barriers are used to control SMP effects. */
+#ifdef CONFIG_SMP
+/* Where possible, use SMP barriers which are more lightweight than mandatory
+ * barriers, because mandatory barriers control MMIO effects on accesses
+ * through relaxed memory I/O windows (which virtio-pci does not use). */
+#define virtio_mb(vq) \
+ do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
+#define virtio_rmb(vq) \
+ do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
+#define virtio_wmb(vq) \
+ do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
+#else
+/* We must force memory ordering even if guest is UP since host could be
+ * running on another CPU, but SMP barriers are defined to barrier() in that
+ * configuration. So fall back to mandatory barriers instead. */
+#define virtio_mb(vq) mb()
+#define virtio_rmb(vq) rmb()
+#define virtio_wmb(vq) wmb()
+#endif
+
+#ifdef DEBUG
+/* For development, we want to crash whenever the ring is screwed. */
+#define BAD_RING(_vq, fmt, args...) \
+ do { \
+ dev_err(&(_vq)->vq.vdev->dev, \
+ "%s:"fmt, (_vq)->vq.name, ##args); \
+ BUG(); \
+ } while (0)
+/* Caller is supposed to guarantee no reentry. */
+#define START_USE(_vq) \
+ do { \
+ if ((_vq)->in_use) \
+ panic("%s:in_use = %i\n", \
+ (_vq)->vq.name, (_vq)->in_use); \
+ (_vq)->in_use = __LINE__; \
+ } while (0)
+#define END_USE(_vq) \
+ do { BUG_ON(!(_vq)->in_use); (_vq)->in_use = 0; } while(0)
+#else
+#define BAD_RING(_vq, fmt, args...) \
+ do { \
+ dev_err(&_vq->vq.vdev->dev, \
+ "%s:"fmt, (_vq)->vq.name, ##args); \
+ (_vq)->broken = true; \
+ } while (0)
+#define START_USE(vq)
+#define END_USE(vq)
+#endif
+
+#endif
--
1.7.9.5
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings.
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty Sjur Brændeland
` (2 subsequent siblings)
4 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization, sjur, Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Add last avilable index to the vring_virtqueue structure,
this is done to prepare for implementation of the reversed vring.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/vring.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/virtio/vring.h b/drivers/virtio/vring.h
index b997fc3..3b53961 100644
--- a/drivers/virtio/vring.h
+++ b/drivers/virtio/vring.h
@@ -51,6 +51,9 @@ struct vring_virtqueue
/* Last used index we've seen. */
u16 last_used_idx;
+ /* Last avail index seen. NOTE: Only used for reversed rings.*/
+ u16 last_avail_idx;
+
/* How to notify other side. FIXME: commonalize hcalls! */
void (*notify)(struct virtqueue *vq);
--
1.7.9.5
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio Sjur Brændeland
2012-11-01 7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
4 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization, sjur, Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Enable option to force call of callback function even if
used ring is empty. This is needed for reversed vring.
Add a helper function __vring_interrupt and add extra
boolean argument for forcing callback when interrupt is called.
The original vring_interrupt semantic and signature is
perserved.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/remoteproc/remoteproc_virtio.c | 2 +-
drivers/virtio/virtio_ring.c | 6 +++---
include/linux/virtio_ring.h | 8 +++++++-
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index e7a4780..ddde863 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -63,7 +63,7 @@ irqreturn_t rproc_vq_interrupt(struct rproc *rproc, int notifyid)
if (!rvring || !rvring->vq)
return IRQ_NONE;
- return vring_interrupt(0, rvring->vq);
+ return __vring_interrupt(0, rvring->vq, true);
}
EXPORT_SYMBOL(rproc_vq_interrupt);
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9027af6..af85034 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -504,11 +504,11 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
}
EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
-irqreturn_t vring_interrupt(int irq, void *_vq)
+irqreturn_t __vring_interrupt(int irq, void *_vq, bool force)
{
struct vring_virtqueue *vq = to_vvq(_vq);
- if (!more_used(vq)) {
+ if (!force && !more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
}
@@ -522,7 +522,7 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
return IRQ_HANDLED;
}
-EXPORT_SYMBOL_GPL(vring_interrupt);
+EXPORT_SYMBOL_GPL(__vring_interrupt);
struct virtqueue *vring_new_virtqueue(unsigned int index,
unsigned int num,
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 63c6ea1..ccb7915 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -20,5 +20,11 @@ void vring_del_virtqueue(struct virtqueue *vq);
/* Filter out transport-specific feature bits. */
void vring_transport_features(struct virtio_device *vdev);
-irqreturn_t vring_interrupt(int irq, void *_vq);
+irqreturn_t __vring_interrupt(int irq, void *_vq, bool force);
+
+static inline irqreturn_t vring_interrupt(int irq, void *_vq)
+{
+ return __vring_interrupt(irq, _vq, false);
+}
+
#endif /* _LINUX_VIRTIO_RING_H */
--
1.7.9.5
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
` (2 preceding siblings ...)
2012-10-31 22:46 ` [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty Sjur Brændeland
@ 2012-10-31 22:46 ` Sjur Brændeland
2012-11-01 7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
4 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-10-31 22:46 UTC (permalink / raw)
To: Rusty Russell
Cc: Vikram ARV, Michael S. Tsirkin, netdev, Linus Walleij,
dmitry.tarnyagin, linux-kernel, virtualization, sjur,
Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Add the CAIF Virtio Link layer, used for communicating with a
modem over shared memory. Virtio is used as the transport mechanism.
In the TX direction the virtio rings are used in the normal fashion,
sending data in the available ring. But in the rx direction the
the we have flipped the direction of the virtio ring, and
implemented the virtio access-function similar to what is found
in vhost.c.
CAIF also uses the virtio configuration space for getting
configuration parameters such as headroom, tailroom etc.
Signed-off-by: Vikram ARV <vikram.arv@stericsson.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/net/caif/Kconfig | 9 +
drivers/net/caif/Makefile | 3 +
drivers/net/caif/caif_virtio.c | 627 +++++++++++++++++++++++++++++++++++++++
include/uapi/linux/virtio_ids.h | 1 +
4 files changed, 640 insertions(+)
create mode 100644 drivers/net/caif/caif_virtio.c
diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index abf4d7a..a01f617 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -47,3 +47,12 @@ config CAIF_HSI
The caif low level driver for CAIF over HSI.
Be aware that if you enable this then you also need to
enable a low-level HSI driver.
+
+config CAIF_VIRTIO
+ tristate "CAIF virtio transport driver"
+ default n
+ depends on CAIF
+ depends on REMOTEPROC
+ select VIRTIO
+ ---help---
+ The caif driver for CAIF over Virtio.
diff --git a/drivers/net/caif/Makefile b/drivers/net/caif/Makefile
index 91dff86..d9ee26a 100644
--- a/drivers/net/caif/Makefile
+++ b/drivers/net/caif/Makefile
@@ -13,3 +13,6 @@ obj-$(CONFIG_CAIF_SHM) += caif_shm.o
# HSI interface
obj-$(CONFIG_CAIF_HSI) += caif_hsi.o
+
+# Virtio interface
+obj-$(CONFIG_CAIF_VIRTIO) += caif_virtio.o
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
new file mode 100644
index 0000000..e50940f
--- /dev/null
+++ b/drivers/net/caif/caif_virtio.c
@@ -0,0 +1,627 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2012
+ * Contact: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Authors: Vicram Arv / vikram.arv@stericsson.com,
+ * Dmitry Tarnyagin / dmitry.tarnyagin@stericsson.com
+ * Sjur Brendeland / sjur.brandeland@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":" fmt
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/dma-mapping.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+#include <linux/spinlock.h>
+#include <net/caif/caif_dev.h>
+#include <linux/virtio_caif.h>
+#include "../drivers/virtio/vring.h"
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Vicram Arv <vikram.arv@stericsson.com>");
+MODULE_DESCRIPTION("Virtio CAIF Driver");
+
+/*
+ * struct cfv_info - Caif Virtio control structure
+ * @cfdev: caif common header
+ * @vdev: Associated virtio device
+ * @vq_rx: rx/downlink virtqueue
+ * @vq_tx: tx/uplink virtqueue
+ * @ndev: associated netdevice
+ * @queued_tx: number of buffers queued in the tx virtqueue
+ * @watermark_tx: indicates number of buffers the tx queue
+ * should shrink to to unblock datapath
+ * @tx_lock: protects vq_tx to allow concurrent senders
+ * @tx_hr: transmit headroom
+ * @rx_hr: receive headroom
+ * @tx_tr: transmit tailroom
+ * @rx_tr: receive tailroom
+ * @mtu: transmit max size
+ * @mru: receive max size
+ */
+struct cfv_info {
+ struct caif_dev_common cfdev;
+ struct virtio_device *vdev;
+ struct virtqueue *vq_rx;
+ struct virtqueue *vq_tx;
+ struct net_device *ndev;
+ unsigned int queued_tx;
+ unsigned int watermark_tx;
+ /* Protect access to vq_tx */
+ spinlock_t tx_lock;
+ /* Copied from Virtio config space */
+ u16 tx_hr;
+ u16 rx_hr;
+ u16 tx_tr;
+ u16 rx_tr;
+ u32 mtu;
+ u32 mru;
+};
+
+/*
+ * struct token_info - maintains Transmit buffer data handle
+ * @size: size of transmit buffer
+ * @dma_handle: handle to allocated dma device memory area
+ * @vaddr: virtual address mapping to allocated memory area
+ */
+struct token_info {
+ size_t size;
+ u8 *vaddr;
+ dma_addr_t dma_handle;
+};
+
+/* Default if virtio config space is unavailable */
+#define CFV_DEF_MTU_SIZE 4096
+#define CFV_DEF_HEADROOM 16
+#define CFV_DEF_TAILROOM 16
+
+/* Require IP header to be 4-byte aligned. */
+#define IP_HDR_ALIGN 4
+
+/*
+ * virtqueue_next_avail_desc - get the next available descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor in the ring
+ *
+ * Look for the next available descriptor in the available ring.
+ * Return NULL if nothing new in the available.
+ */
+static struct vring_desc *virtqueue_next_avail_desc(struct virtqueue *_vq,
+ int *head)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u16 avail_idx, hd, last_avail_idx = vq->last_avail_idx;
+
+ START_USE(vq);
+
+ if (unlikely(vq->broken))
+ goto err;
+
+ /* The barier secures observability of avail->idx after interrupt */
+ virtio_rmb(vq);
+
+ if (vq->last_avail_idx == vq->vring.avail->idx)
+ goto err;
+
+ avail_idx = vq->vring.avail->idx;
+ if (unlikely((u16)(avail_idx - last_avail_idx) > vq->vring.num)) {
+ BAD_RING(vq, "Avail index moved from %u to %u",
+ last_avail_idx, avail_idx);
+ goto err;
+ }
+
+ /*
+ * The barier secures observability of the ring content
+ * after avail->idx update
+ */
+ virtio_rmb(vq);
+
+ hd = vq->vring.avail->ring[last_avail_idx & (vq->vring.num - 1)];
+ /* If their number is silly, that's an error. */
+ if (unlikely(hd >= vq->vring.num)) {
+ BAD_RING(vq, "Remote says index %d > %u is available",
+ *head, vq->vring.num);
+ goto err;
+ }
+
+ END_USE(vq);
+ *head = hd;
+ return &vq->vring.desc[hd];
+err:
+ END_USE(vq);
+ *head = -1;
+ return NULL;
+}
+
+/*
+ * virtqueue_next_linked_desc - get next linked descriptor from the ring
+ * @_vq: the struct virtqueue we're talking about
+ * @desc: "current" descriptor
+ *
+ * Each buffer in the virtqueues is a chain of descriptors. This
+ * function returns the next descriptor in the chain,* or NULL if we're at
+ * the end.
+ *
+ * Side effect: the function increments vq->last_avail_idx if a non-linked
+ * descriptor is passed as &desc argument.
+ */
+static struct vring_desc *virtqueue_next_linked_desc(struct virtqueue *_vq,
+ struct vring_desc *desc)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ unsigned int next;
+
+ START_USE(vq);
+
+ /* If this descriptor says it doesn't chain, we're done. */
+ if (!(desc->flags & VRING_DESC_F_NEXT))
+ goto no_next;
+
+ next = desc->next;
+ /* Make sure compiler knows to grab that: we don't want it changing! */
+ /* We will use the result as an index in an array, so most
+ * architectures only need a compiler barrier here.
+ */
+ read_barrier_depends();
+
+ if (unlikely(next >= vq->vring.num)) {
+ BAD_RING(vq, "Desc index is %u > %u\n", next, vq->vring.num);
+ goto err;
+ }
+
+ desc = &vq->vring.desc[next];
+
+ if (desc->flags & VRING_DESC_F_INDIRECT) {
+ pr_err("Indirect descriptor not supported\n");
+ goto err;
+ }
+
+ END_USE(vq);
+ return desc;
+no_next:
+ vq->last_avail_idx++;
+err:
+ END_USE(vq);
+ return NULL;
+}
+
+/*
+ * virtqueue_add_buf_to_used - release a used descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor to be released
+ * @len: number of linked descriptors in a chain
+ *
+ * The function releases a used descriptor in a reversed ring
+ */
+static int virtqueue_add_buf_to_used(struct virtqueue *_vq,
+ unsigned int head, int len)
+{
+ struct vring_virtqueue *vr_vq = to_vvq(_vq);
+ struct vring_used_elem *used;
+ int used_idx, err = -EINVAL;
+
+ START_USE(vr_vq);
+
+ if (unlikely(vr_vq->broken))
+ goto err;
+
+ if (unlikely(head >= vr_vq->vring.num)) {
+ BAD_RING(vr_vq, "Invalid head index (%u) > max desc idx (%u) ",
+ head, vr_vq->vring.num - 1);
+ goto err;
+ }
+
+ /*
+ * The virtqueue contains a ring of used buffers. Get a pointer to the
+ * next entry in that used ring.
+ */
+ used_idx = (vr_vq->vring.used->idx & (vr_vq->vring.num - 1));
+ used = &vr_vq->vring.used->ring[used_idx];
+ used->id = head;
+ used->len = len;
+
+ /* Make sure buffer is written before we update index. */
+ virtio_wmb(vr_vq);
+ ++vr_vq->vring.used->idx;
+ err = 0;
+err:
+ END_USE(vr_vq);
+ return err;
+
+}
+
+/*
+ * virtqueue_next_desc - get next available or linked descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @desc: "current" descriptor.
+ * @head: on return it is filled by the descriptor index in case of
+ * available descriptor was returned, or -1 in case of linked
+ * descriptor.
+ *
+ * The function is to be used as an iterator through received descriptors.
+ */
+static struct vring_desc *virtqueue_next_desc(struct virtqueue *_vq,
+ struct vring_desc *desc,
+ int *head)
+{
+ struct vring_desc *next = virtqueue_next_linked_desc(_vq, desc);
+
+ if (next == NULL) {
+ virtqueue_add_buf_to_used(_vq, *head, 0);
+ /* tell the remote processor to recycle buffer */
+ virtqueue_kick(_vq);
+ next = virtqueue_next_avail_desc(_vq, head);
+ }
+ return next;
+}
+
+/*
+ * This is invoked whenever the remote processor completed processing
+ * a TX msg we just sent it, and the buffer is put back to the used ring.
+ */
+static void cfv_release_used_buf(struct virtqueue *vq_tx)
+{
+ struct cfv_info *cfv = vq_tx->vdev->priv;
+
+ BUG_ON(vq_tx != cfv->vq_tx);
+
+ for (;;) {
+ unsigned int len;
+ struct token_info *buf_info;
+
+ /* Get used buffer from used ring to recycle used descriptors */
+ spin_lock_bh(&cfv->tx_lock);
+ buf_info = virtqueue_get_buf(vq_tx, &len);
+
+ if (!buf_info)
+ goto out;
+
+ BUG_ON(!cfv->queued_tx);
+ if (--cfv->queued_tx < cfv->watermark_tx) {
+ cfv->watermark_tx = 0;
+ netif_tx_wake_all_queues(cfv->ndev);
+ }
+ spin_unlock_bh(&cfv->tx_lock);
+
+ dma_free_coherent(vq_tx->vdev->dev.parent->parent,
+ buf_info->size, buf_info->vaddr,
+ buf_info->dma_handle);
+ kfree(buf_info);
+ }
+ return;
+out:
+ spin_unlock_bh(&cfv->tx_lock);
+}
+
+static int cfv_read_desc(struct vring_desc *d,
+ void **buf, size_t *size)
+{
+ if (d->flags & VRING_DESC_F_INDIRECT) {
+ pr_warn("Indirect descriptor not supported by CAIF\n");
+ return -EINVAL;
+ }
+
+ if (!(d->flags & VRING_DESC_F_WRITE)) {
+ pr_warn("Write descriptor not supported by CAIF\n");
+ /* CAIF expects a input descriptor here */
+ return -EINVAL;
+ }
+ *buf = phys_to_virt(d->addr);
+ *size = d->len;
+ return 0;
+}
+
+static struct sk_buff *cfv_alloc_and_copy_skb(struct cfv_info *cfv,
+ u8 *frm, u32 frm_len)
+{
+ struct sk_buff *skb;
+ u32 cfpkt_len, pad_len;
+
+ /* Verify that packet size with down-link header and mtu size */
+ if (frm_len > cfv->mru || frm_len <= cfv->rx_hr + cfv->rx_tr) {
+ netdev_err(cfv->ndev,
+ "Invalid frmlen:%u mtu:%u hr:%d tr:%d\n",
+ frm_len, cfv->mru, cfv->rx_hr,
+ cfv->rx_tr);
+ return NULL;
+ }
+
+ cfpkt_len = frm_len - (cfv->rx_hr + cfv->rx_tr);
+
+ pad_len = (unsigned long)(frm + cfv->rx_hr) & (IP_HDR_ALIGN - 1);
+
+ skb = netdev_alloc_skb(cfv->ndev, frm_len + pad_len);
+ if (!skb)
+ return NULL;
+
+ /* Reserve space for headers. */
+ skb_reserve(skb, cfv->rx_hr + pad_len);
+
+ memcpy(skb_put(skb, cfpkt_len), frm + cfv->rx_hr, cfpkt_len);
+ return skb;
+}
+
+/*
+ * This is invoked whenever the remote processor has sent down-link data
+ * on the Rx VQ avail ring and it's time to digest a message.
+ *
+ * CAIF virtio passes a complete CAIF frame including head/tail room
+ * in each linked descriptor. So iterate over all available buffers
+ * in available-ring and the associated linked descriptors.
+ */
+static void cfv_recv(struct virtqueue *vq_rx)
+{
+ struct cfv_info *cfv = vq_rx->vdev->priv;
+ struct vring_desc *desc;
+ struct sk_buff *skb;
+ int head = -1;
+ void *buf;
+ size_t len;
+
+ for (desc = virtqueue_next_avail_desc(vq_rx, &head);
+ desc != NULL && !cfv_read_desc(desc, &buf, &len);
+ desc = virtqueue_next_desc(vq_rx, desc, &head)) {
+
+ skb = cfv_alloc_and_copy_skb(cfv, buf, len);
+
+ if (!skb)
+ goto err;
+
+ skb->protocol = htons(ETH_P_CAIF);
+ skb_reset_mac_header(skb);
+ skb->dev = cfv->ndev;
+
+ /* Push received packet up the stack. */
+ if (netif_receive_skb(skb))
+ goto err;
+
+ ++cfv->ndev->stats.rx_packets;
+ cfv->ndev->stats.rx_bytes += skb->len;
+ }
+ return;
+err:
+ ++cfv->ndev->stats.rx_dropped;
+ return;
+}
+
+static int cfv_netdev_open(struct net_device *netdev)
+{
+ netif_carrier_on(netdev);
+ return 0;
+}
+
+static int cfv_netdev_close(struct net_device *netdev)
+{
+ netif_carrier_off(netdev);
+ return 0;
+}
+
+static struct token_info *cfv_alloc_and_copy_to_dmabuf(struct cfv_info *cfv,
+ struct sk_buff *skb,
+ struct scatterlist *sg)
+{
+ struct caif_payload_info *info = (void *)&skb->cb;
+ struct token_info *buf_info = NULL;
+ u8 pad_len, hdr_ofs;
+
+ if (unlikely(cfv->tx_hr + skb->len + cfv->tx_tr > cfv->mtu)) {
+ netdev_warn(cfv->ndev, "Invalid packet len (%d > %d)\n",
+ cfv->tx_hr + skb->len + cfv->tx_tr, cfv->mtu);
+ goto err;
+ }
+
+ buf_info = kmalloc(sizeof(struct token_info), GFP_ATOMIC);
+ if (unlikely(!buf_info))
+ goto err;
+
+ /* Make the IP header aligned in tbe buffer */
+ hdr_ofs = cfv->tx_hr + info->hdr_len;
+ pad_len = hdr_ofs & (IP_HDR_ALIGN - 1);
+ buf_info->size = cfv->tx_hr + skb->len + cfv->tx_tr + pad_len;
+
+ if (WARN_ON_ONCE(cfv->vdev->dev.parent))
+ goto err;
+
+ /* allocate coherent memory for the buffers */
+ buf_info->vaddr =
+ dma_alloc_coherent(cfv->vdev->dev.parent->parent,
+ buf_info->size, &buf_info->dma_handle,
+ GFP_ATOMIC);
+ if (unlikely(!buf_info->vaddr)) {
+ netdev_warn(cfv->ndev,
+ "Out of DMA memory (alloc %zu bytes)\n",
+ buf_info->size);
+ goto err;
+ }
+
+ /* copy skbuf contents to send buffer */
+ skb_copy_bits(skb, 0, buf_info->vaddr + cfv->tx_hr + pad_len, skb->len);
+ sg_init_one(sg, buf_info->vaddr + pad_len,
+ skb->len + cfv->tx_hr + cfv->rx_hr);
+ return buf_info;
+err:
+ kfree(buf_info);
+ return NULL;
+}
+
+/*
+ * This is invoked whenever the host processor application has sent up-link data.
+ * Send it in the TX VQ avail ring.
+ *
+ * CAIF Virtio sends does not use linked descriptors in the tx direction.
+ */
+static int cfv_netdev_tx(struct sk_buff *skb, struct net_device *netdev)
+{
+ struct cfv_info *cfv = netdev_priv(netdev);
+ struct token_info *buf_info;
+ struct scatterlist sg;
+ bool flow_off = false;
+
+ buf_info = cfv_alloc_and_copy_to_dmabuf(cfv, skb, &sg);
+ spin_lock_bh(&cfv->tx_lock);
+
+ /*
+ * Add buffer to avail ring.
+ * Note: in spite of a space check at beginning of the function,
+ * the add_buff call might fail in case of concurrent access on smp
+ * systems.
+ */
+ if (WARN_ON(virtqueue_add_buf(cfv->vq_tx, &sg, 0, 1,
+ buf_info, GFP_ATOMIC) < 0)) {
+ /* It should not happen */
+ ++cfv->ndev->stats.tx_dropped;
+ flow_off = true;
+ } else {
+ /* update netdev statistics */
+ cfv->queued_tx++;
+ cfv->ndev->stats.tx_packets++;
+ cfv->ndev->stats.tx_bytes += skb->len;
+ }
+
+ /*
+ * Flow-off check takes into account number of cpus to make sure
+ * virtqueue will not be overfilled in any possible smp conditions.
+ */
+ flow_off = cfv->queued_tx + num_present_cpus() >=
+ virtqueue_get_vring_size(cfv->vq_tx);
+
+ /* tell the remote processor it has a pending message to read */
+ virtqueue_kick(cfv->vq_tx);
+
+ if (flow_off) {
+ cfv->watermark_tx = cfv->queued_tx >> 1;
+ netif_tx_stop_all_queues(netdev);
+ }
+
+ spin_unlock_bh(&cfv->tx_lock);
+
+ dev_kfree_skb(skb);
+
+ /* Try to speculatively free used buffers */
+ if (flow_off)
+ cfv_release_used_buf(cfv->vq_tx);
+
+ return NETDEV_TX_OK;
+}
+
+static const struct net_device_ops cfv_netdev_ops = {
+ .ndo_open = cfv_netdev_open,
+ .ndo_stop = cfv_netdev_close,
+ .ndo_start_xmit = cfv_netdev_tx,
+};
+
+static void cfv_netdev_setup(struct net_device *netdev)
+{
+ netdev->netdev_ops = &cfv_netdev_ops;
+ netdev->type = ARPHRD_CAIF;
+ netdev->tx_queue_len = 100;
+ netdev->flags = IFF_POINTOPOINT | IFF_NOARP;
+ netdev->mtu = CFV_DEF_MTU_SIZE;
+ netdev->destructor = free_netdev;
+}
+
+#define GET_VIRTIO_CONFIG_OPS(_v, _var, _f) \
+ ((_v)->config->get(_v, offsetof(struct virtio_caif_transf_config, _f), \
+ &_var, \
+ FIELD_SIZEOF(struct virtio_caif_transf_config, _f)))
+
+static int __devinit cfv_probe(struct virtio_device *vdev)
+{
+ vq_callback_t *vq_cbs[] = { cfv_recv, cfv_release_used_buf };
+ const char *names[] = { "input", "output" };
+ const char *cfv_netdev_name = "cfvrt";
+ struct net_device *netdev;
+ struct virtqueue *vqs[2];
+ struct cfv_info *cfv;
+ int err = 0;
+
+ netdev = alloc_netdev(sizeof(struct cfv_info), cfv_netdev_name,
+ cfv_netdev_setup);
+ if (!netdev)
+ return -ENOMEM;
+
+ cfv = netdev_priv(netdev);
+ cfv->vdev = vdev;
+ cfv->ndev = netdev;
+
+ spin_lock_init(&cfv->tx_lock);
+
+ /* Get two virtqueues, for tx/ul and rx/dl */
+ err = vdev->config->find_vqs(vdev, 2, vqs, vq_cbs, names);
+ if (err)
+ goto free_cfv;
+
+ cfv->vq_rx = vqs[0];
+ cfv->vq_tx = vqs[1];
+
+ if (vdev->config->get) {
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_hr, headroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_hr, headroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_tr, tailroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_tr, tailroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->mtu, mtu);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->mru, mtu);
+ } else {
+ cfv->tx_hr = CFV_DEF_HEADROOM;
+ cfv->rx_hr = CFV_DEF_HEADROOM;
+ cfv->tx_tr = CFV_DEF_TAILROOM;
+ cfv->rx_tr = CFV_DEF_TAILROOM;
+ cfv->mtu = CFV_DEF_MTU_SIZE;
+ cfv->mru = CFV_DEF_MTU_SIZE;
+
+ }
+
+ vdev->priv = cfv;
+
+ netif_carrier_off(netdev);
+
+ /* register Netdev */
+ err = register_netdev(netdev);
+ if (err) {
+ dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err);
+ goto vqs_del;
+ }
+
+ /* tell the remote processor it can start sending messages */
+ virtqueue_kick(cfv->vq_rx);
+ return 0;
+
+vqs_del:
+ vdev->config->del_vqs(cfv->vdev);
+free_cfv:
+ free_netdev(netdev);
+ return err;
+}
+
+static void __devexit cfv_remove(struct virtio_device *vdev)
+{
+ struct cfv_info *cfv = vdev->priv;
+ vdev->config->reset(vdev);
+ vdev->config->del_vqs(cfv->vdev);
+ unregister_netdev(cfv->ndev);
+}
+
+static struct virtio_device_id id_table[] = {
+ { VIRTIO_ID_CAIF, VIRTIO_DEV_ANY_ID },
+ { 0 },
+};
+
+static unsigned int features[] = {
+};
+
+static struct virtio_driver caif_virtio_driver = {
+ .feature_table = features,
+ .feature_table_size = ARRAY_SIZE(features),
+ .driver.name = KBUILD_MODNAME,
+ .driver.owner = THIS_MODULE,
+ .id_table = id_table,
+ .probe = cfv_probe,
+ .remove = cfv_remove,
+};
+
+module_driver(caif_virtio_driver, register_virtio_driver,
+ unregister_virtio_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 270fb22..8ddad5a 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -37,5 +37,6 @@
#define VIRTIO_ID_RPMSG 7 /* virtio remote processor messaging */
#define VIRTIO_ID_SCSI 8 /* virtio scsi */
#define VIRTIO_ID_9P 9 /* 9p virtio console */
+#define VIRTIO_ID_CAIF 12 /* virtio caif */
#endif /* _LINUX_VIRTIO_IDS_H */
--
1.7.9.5
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
` (3 preceding siblings ...)
2012-10-31 22:46 ` [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio Sjur Brændeland
@ 2012-11-01 7:41 ` Rusty Russell
2012-11-05 12:12 ` Sjur Brændeland
[not found] ` <CANHm3PgrsTD4uYuXN0AMuZFX794CJmmus4AST=G0+nP1ha3VyQ@mail.gmail.com>
4 siblings, 2 replies; 51+ messages in thread
From: Rusty Russell @ 2012-11-01 7:41 UTC (permalink / raw)
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization, sjur
Sjur Brændeland <sjur@brendeland.net> writes:
> Zero-Copy data transport on the modem is primary goal for CAIF Virtio.
> In order to achieve Zero-Copy the direction of the Virtio rings are
> flipped in the RX direction. So we have implemented the Virtio
> access-function similar to what is found in vhost.c.
So, this adds another host-side virtqueue implementation.
Can we combine them together conveniently? You pulled out more stuff
into vring.h which is a start, but it's a bit overloaded.
Perhaps we should separate the common fields into struct vring, and use
it to build:
struct vring_guest {
struct vring vr;
u16 last_used_idx;
};
struct vring_host {
struct vring vr;
u16 last_avail_idx;
};
I haven't looked closely at vhost to see what it wants, but I would
think we could share more code.
Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
2012-11-01 7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
@ 2012-11-05 12:12 ` Sjur Brændeland
[not found] ` <CANHm3PgrsTD4uYuXN0AMuZFX794CJmmus4AST=G0+nP1ha3VyQ@mail.gmail.com>
1 sibling, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-11-05 12:12 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization
Hi Rusty,
> So, this adds another host-side virtqueue implementation.
>
> Can we combine them together conveniently? You pulled out more stuff
> into vring.h which is a start, but it's a bit overloaded.
> Perhaps we should separate the common fields into struct vring, and use
> it to build:
>
> struct vring_guest {
> struct vring vr;
> u16 last_used_idx;
> };
>
> struct vring_host {
> struct vring vr;
> u16 last_avail_idx;
> };
> I haven't looked closely at vhost to see what it wants, but I would
> think we could share more code.
I have played around with the code in vhost.c to explore your idea.
The main issue I run into is that vhost.c is accessing user data while my new
code does not. So I end up with some quirky code testing if the ring lives in
user memory or not. Another issue is sparse warnings when
accessing user memory.
With your suggested changes I end up sharing about 100 lines of code.
So in sum, I feel this add more complexity than what we gain by sharing.
Below is an initial draft of the re-usable code. I added "is_uaccess" to struct
virtio_ring in order to know if the ring lives in user memory.
Let me know what you think.
[snip]
int virtqueue_add_used(struct vring_host *vr, unsigned int head, int len,
struct vring_used_elem **used)
{
/* The virtqueue contains a ring of used buffers. Get a pointer to the
* next entry in that used ring. */
*used = &vr->vring.used->ring[vr->last_used_idx % vr->vring.num];
if (vr->is_uaccess) {
if(unlikely(__put_user(head, &(*used)->id))) {
pr_debug("Failed to write used id");
return -EFAULT;
}
if (unlikely(__put_user(len, &(*used)->len))) {
pr_debug("Failed to write used len");
return -EFAULT;
}
smp_wmb();
if (__put_user(vr->last_used_idx + 1,
&vr->vring.used->idx)) {
pr_debug("Failed to increment used idx");
return -EFAULT;
}
} else {
(*used)->id = head;
(*used)->len = len;
smp_wmb();
vr->vring.used->idx = vr->last_used_idx + 1;
}
vr->last_used_idx++;
return 0;
}
/* Each buffer in the virtqueues is actually a chain of descriptors. This
* function returns the next descriptor in the chain,
* or -1U if we're at the end. */
unsigned virtqueue_next_desc(struct vring_desc *desc)
{
unsigned int next;
/* If this descriptor says it doesn't chain, we're done. */
if (!(desc->flags & VRING_DESC_F_NEXT))
return -1U;
/* Check they're not leading us off end of descriptors. */
next = desc->next;
/* Make sure compiler knows to grab that: we don't want it changing! */
/* We will use the result as an index in an array, so most
* architectures only need a compiler barrier here. */
read_barrier_depends();
return next;
}
static int virtqueue_next_avail_desc(struct vring_host *vr)
{
int head;
u16 last_avail_idx;
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vr->last_avail_idx;
if (vr->is_uaccess) {
if (__get_user(vr->avail_idx, &vr->vring.avail->idx)) {
pr_debug("Failed to access avail idx at %p\n",
&vr->vring.avail->idx);
return -EFAULT;
}
} else
vr->avail_idx = vr->vring.avail->idx;
if (unlikely((u16)(vr->avail_idx - last_avail_idx) > vr->vring.num)) {
pr_debug("Guest moved used index from %u to %u",
last_avail_idx, vr->avail_idx);
return -EFAULT;
}
/* If there's nothing new since last we looked, return invalid. */
if (vr->avail_idx == last_avail_idx)
return vr->vring.num;
/* Only get avail ring entries after they have been exposed by guest. */
smp_rmb();
/* Grab the next descriptor number they're advertising, and increment
* the index we've seen. */
if (vr->is_uaccess) {
if (unlikely(__get_user(head,
&vr->vring.avail->ring[last_avail_idx
% vr->vring.num]))) {
pr_debug("Failed to read head: idx %d address %p\n",
last_avail_idx,
&vr->vring.avail->ring[last_avail_idx %
vr->vring.num]);
return -EFAULT;
}
} else
head = vr->vring.avail->ring[last_avail_idx % vr->vring.num];
/* If their number is silly, that's an error. */
if (unlikely(head >= vr->vring.num)) {
pr_debug("Guest says index %u > %u is available",
head, vr->vring.num);
return -EINVAL;
}
return head;
}
Thanks,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings
[not found] ` <CANHm3PgrsTD4uYuXN0AMuZFX794CJmmus4AST=G0+nP1ha3VyQ@mail.gmail.com>
@ 2012-11-06 2:09 ` Rusty Russell
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
0 siblings, 1 reply; 51+ messages in thread
From: Rusty Russell @ 2012-11-06 2:09 UTC (permalink / raw)
To: Sjur Brændeland
Cc: Michael S. Tsirkin, netdev, Linus Walleij, dmitry.tarnyagin,
linux-kernel, virtualization
Sjur Brændeland <sjurbr@gmail.com> writes:
> Hi Rusty,
>
>> So, this adds another host-side virtqueue implementation.
>>
>> Can we combine them together conveniently? You pulled out more stuff
>> into vring.h which is a start, but it's a bit overloaded.
>> Perhaps we should separate the common fields into struct vring, and use
>> it to build:
>>
>> struct vring_guest {
>> struct vring vr;
>> u16 last_used_idx;
>> };
>>
>> struct vring_host {
>> struct vring vr;
>> u16 last_avail_idx;
>> };
>> I haven't looked closely at vhost to see what it wants, but I would
>> think we could share more code.
>
> I have played around with the code in vhost.c to explore your idea.
> The main issue I run into is that vhost.c is accessing user data while my new
> code does not. So I end up with some quirky code testing if the ring lives in
> user memory or not. Another issue is sparse warnings when
> accessing user memory.
Sparse is a servant, not a master. If that's the only thing stopping
us, we can ignore it (or hack around it).
> With your suggested changes I end up sharing about 100 lines of code.
> So in sum, I feel this add more complexity than what we gain by sharing.
>
> Below is an initial draft of the re-usable code. I added "is_uaccess" to struct
> virtio_ring in order to know if the ring lives in user memory.
>
> Let me know what you think.
Agreed, that's horrible...
Fortunately, recent GCCs will inline function pointers, so inlining this
and handing an accessor function gets optimized away.
I would really like this, because I'd love to have a config option to do
strict checking on the format of these things (similar to my recently
posted CONFIG_VIRTIO_DEVICE_TORTURE patch).
See below.
> int virtqueue_add_used(struct vring_host *vr, unsigned int head, int len,
> struct vring_used_elem **used)
> {
> /* The virtqueue contains a ring of used buffers. Get a pointer to the
> * next entry in that used ring. */
> *used = &vr->vring.used->ring[vr->last_used_idx % vr->vring.num];
> if (vr->is_uaccess) {
> if(unlikely(__put_user(head, &(*used)->id))) {
> pr_debug("Failed to write used id");
> return -EFAULT;
> }
> if (unlikely(__put_user(len, &(*used)->len))) {
> pr_debug("Failed to write used len");
> return -EFAULT;
> }
> smp_wmb();
> if (__put_user(vr->last_used_idx + 1,
> &vr->vring.used->idx)) {
> pr_debug("Failed to increment used idx");
> return -EFAULT;
> }
> } else {
> (*used)->id = head;
> (*used)->len = len;
> smp_wmb();
> vr->vring.used->idx = vr->last_used_idx + 1;
> }
> vr->last_used_idx++;
> return 0;
> }
/* Untested! */
static inline bool in_kernel_put(u32 *dst, u32 v)
{
*dst = v;
return true;
}
static inline bool userspace_put(u32 *dst, u32 v)
{
return __put_user(dst, v) == 0;
}
static inline struct vring_used_elem *vrh_add_used(struct vring_host *vr,
unsigned int head, u32 len,
bool (*put)(u32 *dst, u32 v))
{
struct vring_used_elem *used;
/* The virtqueue contains a ring of used buffers. Get a pointer to the
* next entry in that used ring. */
used = &vr->vring.used->ring[vr->last_used_idx % vr->vring.num];
if (!put(&used->id, head) || !put(&used->len = len))
return NULL;
smp_wmb();
if (!put(&vr->vring.used->idx, vr->last_used_idx + 1))
return NULL;
vr->last_used_idx++;
return used;
}
Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2012-11-06 2:09 ` Rusty Russell
@ 2012-12-05 14:36 ` Sjur Brændeland
2012-12-05 14:36 ` [RFCv2 01/12] vhost: Use struct vring in vhost_virtqueue Sjur Brændeland
` (12 more replies)
0 siblings, 13 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:36 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Linus Walleij, virtualization,
Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
This patch-set introduces a host-side virtqueue implementation and
the CAIF Virtio Link layer. See http://lwn.net/Articles/522296/ for
background info on CAIF over Virtio.
After feedback from Rusty, I have re-factored vhost.c and
pulled out common functionality and data definitions from
drivers/vhost into drivers/virtio.
Part of the challenge of doing this is that vhost.c is accessing the
virtio rings from user-space, while CAIF uses kernel memory.
In order to solve this issue, inline memory access functions are passed
to some of the vring functions (as suggested by Rusty).
I have added one argument to the function find_vqs(), in order to
request host-side virtio-queue. (This needs some more work...)
Feedback on this patch-set is appreciated, particularly on structure
and code-reuse between vhost.c and the host-side virtio-queue.
I'd also like some suggestions on how to handle the build configuration
better - currently there are some unnecessary build dependencies.
The patches are based on v3.7, and compile tested only.
Rusty, if you want to review this later, after the merge window closes,
let me know and I'll resend these patches later.
Thanks,
Sjur
Sjur Brændeland (11):
vhost: Use struct vring in vhost_virtqueue
vhost: Isolate reusable vring related functions
virtio-ring: Introduce file virtio_ring_host
virtio-ring: Refactor out the functions accessing user memory
virtio-ring: Refactor move attributes to struct virtqueue
virtio_ring: Move SMP macros to virtio_ring.h
virtio-ring: Add Host side virtio-ring implementation
virtio: Update vring_interrupt for host-side virtio queues
virtio-ring: Add BUG_ON checking on host/guest ring type
virtio: Add argument reversed to function find_vqs()
remoteproc: Add support for host-virtqueues
Vikram ARV (1):
caif_virtio: Introduce caif over virtio
drivers/char/virtio_console.c | 3 +-
drivers/lguest/lguest_device.c | 5 +-
drivers/net/caif/Kconfig | 8 +
drivers/net/caif/Makefile | 3 +
drivers/net/caif/caif_virtio.c | 482 ++++++++++++++++++++++++++++++++
drivers/net/virtio_net.c | 3 +-
drivers/remoteproc/remoteproc_virtio.c | 18 +-
drivers/rpmsg/virtio_rpmsg_bus.c | 2 +-
drivers/s390/kvm/kvm_virtio.c | 5 +-
drivers/scsi/virtio_scsi.c | 2 +-
drivers/vhost/Kconfig | 2 +
drivers/vhost/net.c | 4 +-
drivers/vhost/vhost.c | 272 +++++++-----------
drivers/vhost/vhost.h | 14 +-
drivers/virtio/Kconfig | 3 +
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_balloon.c | 3 +-
drivers/virtio/virtio_mmio.c | 5 +-
drivers/virtio/virtio_pci.c | 3 +-
drivers/virtio/virtio_ring.c | 50 ++---
drivers/virtio/virtio_ring_host.c | 345 +++++++++++++++++++++++
include/linux/virtio.h | 6 +
include/linux/virtio_caif.h | 25 ++
include/linux/virtio_config.h | 7 +-
include/linux/virtio_ring.h | 77 +++++-
include/uapi/linux/virtio_ids.h | 1 +
26 files changed, 1120 insertions(+), 229 deletions(-)
create mode 100644 drivers/net/caif/caif_virtio.c
create mode 100644 drivers/virtio/virtio_ring_host.c
create mode 100644 include/linux/virtio_caif.h
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* [RFCv2 01/12] vhost: Use struct vring in vhost_virtqueue
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
@ 2012-12-05 14:36 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 02/12] vhost: Isolate reusable vring related functions Sjur Brændeland
` (11 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:36 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Pull out common vring attributes from vhost_virtqueue to a new
struct vring_host. This allows for reuse of data definitions
between vhost and virtio queue when host-side virtio queue is
introduced. Also unsigned long is replaced with ulong a couple
of places.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/vhost/net.c | 4 +-
drivers/vhost/vhost.c | 213 +++++++++++++++++++++++--------------------
drivers/vhost/vhost.h | 14 +---
include/linux/virtio_ring.h | 13 +++
4 files changed, 130 insertions(+), 114 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 072cbba..8fc1869 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -182,7 +182,7 @@ static void handle_tx(struct vhost_net *net)
if (unlikely(head < 0))
break;
/* Nothing new? Wait for eventfd to tell us they refilled. */
- if (head == vq->num) {
+ if (head == vq->hst.vr.num) {
int num_pends;
wmem = atomic_read(&sock->sk->sk_wmem_alloc);
@@ -329,7 +329,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
d = vhost_get_vq_desc(vq->dev, vq, vq->iov + seg,
ARRAY_SIZE(vq->iov) - seg, &out,
&in, log, log_num);
- if (d == vq->num) {
+ if (d == vq->hst.vr.num) {
r = 0;
goto err;
}
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 99ac2cb..0a676f1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -12,6 +12,7 @@
*/
#include <linux/eventfd.h>
+#include <linux/types.h>
#include <linux/vhost.h>
#include <linux/virtio_net.h>
#include <linux/mm.h>
@@ -39,8 +40,10 @@ enum {
static unsigned vhost_zcopy_mask __read_mostly;
-#define vhost_used_event(vq) ((u16 __user *)&vq->avail->ring[vq->num])
-#define vhost_avail_event(vq) ((u16 __user *)&vq->used->ring[vq->num])
+#define vhost_used_event(vq) ((u16 __user *) \
+ &vq->hst.vr.avail->ring[vq->hst.vr.num])
+#define vhost_avail_event(vq) ((u16 __user *)\
+ &vq->hst.vr.used->ring[vq->hst.vr.num])
static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
poll_table *pt)
@@ -57,7 +60,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
{
struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait);
- if (!((unsigned long)key & poll->mask))
+ if (!((ulong)key & poll->mask))
return 0;
vhost_poll_queue(poll);
@@ -75,7 +78,7 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
/* Init poll structure */
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
- unsigned long mask, struct vhost_dev *dev)
+ ulong mask, struct vhost_dev *dev)
{
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
init_poll_funcptr(&poll->table, vhost_poll_func);
@@ -89,7 +92,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
* keep a reference to a file until after vhost_poll_stop is called. */
void vhost_poll_start(struct vhost_poll *poll, struct file *file)
{
- unsigned long mask;
+ ulong mask;
mask = file->f_op->poll(file, &poll->table);
if (mask)
@@ -139,7 +142,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
{
- unsigned long flags;
+ ulong flags;
spin_lock_irqsave(&dev->work_lock, flags);
if (list_empty(&work->node)) {
@@ -158,13 +161,13 @@ void vhost_poll_queue(struct vhost_poll *poll)
static void vhost_vq_reset(struct vhost_dev *dev,
struct vhost_virtqueue *vq)
{
- vq->num = 1;
- vq->desc = NULL;
- vq->avail = NULL;
- vq->used = NULL;
- vq->last_avail_idx = 0;
- vq->avail_idx = 0;
- vq->last_used_idx = 0;
+ vq->hst.vr.num = 1;
+ vq->hst.vr.desc = NULL;
+ vq->hst.vr.avail = NULL;
+ vq->hst.vr.used = NULL;
+ vq->hst.last_avail_idx = 0;
+ vq->hst.avail_idx = 0;
+ vq->hst.last_used_idx = 0;
vq->signalled_used = 0;
vq->signalled_used_valid = false;
vq->used_flags = 0;
@@ -489,13 +492,13 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
dev->mm = NULL;
}
-static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
+static int log_access_ok(void __user *log_base, u64 addr, ulong sz)
{
u64 a = addr / VHOST_PAGE_SIZE / 8;
/* Make sure 64 bit math will not overflow. */
- if (a > ULONG_MAX - (unsigned long)log_base ||
- a + (unsigned long)log_base > ULONG_MAX)
+ if (a > ULONG_MAX - (ulong)log_base ||
+ a + (ulong)log_base > ULONG_MAX)
return 0;
return access_ok(VERIFY_WRITE, log_base + a,
@@ -513,7 +516,7 @@ static int vq_memory_access_ok(void __user *log_base, struct vhost_memory *mem,
for (i = 0; i < mem->nregions; ++i) {
struct vhost_memory_region *m = mem->regions + i;
- unsigned long a = m->userspace_addr;
+ ulong a = m->userspace_addr;
if (m->memory_size > ULONG_MAX)
return 0;
else if (!access_ok(VERIFY_WRITE, (void __user *)a,
@@ -587,22 +590,24 @@ static int vq_log_access_ok(struct vhost_dev *d, struct vhost_virtqueue *vq,
return vq_memory_access_ok(log_base, mp,
vhost_has_feature(vq->dev, VHOST_F_LOG_ALL)) &&
(!vq->log_used || log_access_ok(log_base, vq->log_addr,
- sizeof *vq->used +
- vq->num * sizeof *vq->used->ring + s));
+ sizeof *vq->hst.vr.used +
+ vq->hst.vr.num *
+ sizeof *vq->hst.vr.used->ring + s));
}
/* Can we start vq? */
/* Caller should have vq mutex and device mutex */
int vhost_vq_access_ok(struct vhost_virtqueue *vq)
{
- return vq_access_ok(vq->dev, vq->num, vq->desc, vq->avail, vq->used) &&
+ return vq_access_ok(vq->dev, vq->hst.vr.num, vq->hst.vr.desc,
+ vq->hst.vr.avail, vq->hst.vr.used) &&
vq_log_access_ok(vq->dev, vq, vq->log_base);
}
static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
{
struct vhost_memory mem, *newmem, *oldmem;
- unsigned long size = offsetof(struct vhost_memory, regions);
+ ulong size = offsetof(struct vhost_memory, regions);
if (copy_from_user(&mem, m, size))
return -EFAULT;
@@ -673,7 +678,7 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
r = -EINVAL;
break;
}
- vq->num = s.num;
+ vq->hst.vr.num = s.num;
break;
case VHOST_SET_VRING_BASE:
/* Moving base with an active backend?
@@ -690,13 +695,13 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
r = -EINVAL;
break;
}
- vq->last_avail_idx = s.num;
+ vq->hst.last_avail_idx = s.num;
/* Forget the cached index value. */
- vq->avail_idx = vq->last_avail_idx;
+ vq->hst.avail_idx = vq->hst.last_avail_idx;
break;
case VHOST_GET_VRING_BASE:
s.index = idx;
- s.num = vq->last_avail_idx;
+ s.num = vq->hst.last_avail_idx;
if (copy_to_user(argp, &s, sizeof s))
r = -EFAULT;
break;
@@ -711,15 +716,15 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
}
/* For 32bit, verify that the top 32bits of the user
data are set to zero. */
- if ((u64)(unsigned long)a.desc_user_addr != a.desc_user_addr ||
- (u64)(unsigned long)a.used_user_addr != a.used_user_addr ||
- (u64)(unsigned long)a.avail_user_addr != a.avail_user_addr) {
+ if ((u64)(ulong)a.desc_user_addr != a.desc_user_addr ||
+ (u64)(ulong)a.used_user_addr != a.used_user_addr ||
+ (u64)(ulong)a.avail_user_addr != a.avail_user_addr) {
r = -EFAULT;
break;
}
- if ((a.avail_user_addr & (sizeof *vq->avail->ring - 1)) ||
- (a.used_user_addr & (sizeof *vq->used->ring - 1)) ||
- (a.log_guest_addr & (sizeof *vq->used->ring - 1))) {
+ if ((a.avail_user_addr & (sizeof *vq->hst.vr.avail->ring-1)) ||
+ (a.used_user_addr & (sizeof *vq->hst.vr.used->ring-1)) ||
+ (a.log_guest_addr & (sizeof *vq->hst.vr.used->ring-1))) {
r = -EINVAL;
break;
}
@@ -728,10 +733,10 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
* If it is not, we don't as size might not have been setup.
* We will verify when backend is configured. */
if (vq->private_data) {
- if (!vq_access_ok(d, vq->num,
- (void __user *)(unsigned long)a.desc_user_addr,
- (void __user *)(unsigned long)a.avail_user_addr,
- (void __user *)(unsigned long)a.used_user_addr)) {
+ if (!vq_access_ok(d, vq->hst.vr.num,
+ (void __user *)(ulong)a.desc_user_addr,
+ (void __user *)(ulong)a.avail_user_addr,
+ (void __user *)(ulong)a.used_user_addr)) {
r = -EINVAL;
break;
}
@@ -739,18 +744,22 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
/* Also validate log access for used ring if enabled. */
if ((a.flags & (0x1 << VHOST_VRING_F_LOG)) &&
!log_access_ok(vq->log_base, a.log_guest_addr,
- sizeof *vq->used +
- vq->num * sizeof *vq->used->ring)) {
+ sizeof *vq->hst.vr.used +
+ vq->hst.vr.num *
+ sizeof *vq->hst.vr.used->ring)) {
r = -EINVAL;
break;
}
}
vq->log_used = !!(a.flags & (0x1 << VHOST_VRING_F_LOG));
- vq->desc = (void __user *)(unsigned long)a.desc_user_addr;
- vq->avail = (void __user *)(unsigned long)a.avail_user_addr;
+ vq->hst.vr.desc =
+ (void __user *)(ulong)a.desc_user_addr;
+ vq->hst.vr.avail =
+ (void __user *)(ulong)a.avail_user_addr;
vq->log_addr = a.log_guest_addr;
- vq->used = (void __user *)(unsigned long)a.used_user_addr;
+ vq->hst.vr.used =
+ (void __user *)(ulong)a.used_user_addr;
break;
case VHOST_SET_VRING_KICK:
if (copy_from_user(&f, argp, sizeof f)) {
@@ -829,7 +838,7 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
}
/* Caller must have device mutex */
-long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
+long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, ulong arg)
{
void __user *argp = (void __user *)arg;
struct file *eventfp, *filep = NULL;
@@ -858,13 +867,13 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned long arg)
r = -EFAULT;
break;
}
- if ((u64)(unsigned long)p != p) {
+ if ((u64)(ulong)p != p) {
r = -EFAULT;
break;
}
for (i = 0; i < d->nvqs; ++i) {
struct vhost_virtqueue *vq;
- void __user *base = (void __user *)(unsigned long)p;
+ void __user *base = (void __user *)(ulong)p;
vq = d->vqs + i;
mutex_lock(&vq->mutex);
/* If ring is inactive, will check when it's enabled. */
@@ -932,7 +941,7 @@ static const struct vhost_memory_region *find_region(struct vhost_memory *mem,
*/
static int set_bit_to_user(int nr, void __user *addr)
{
- unsigned long log = (unsigned long)addr;
+ ulong log = (ulong)addr;
struct page *page;
void *base;
int bit = nr + (log % PAGE_SIZE) * 8;
@@ -960,12 +969,12 @@ static int log_write(void __user *log_base,
return 0;
write_length += write_address % VHOST_PAGE_SIZE;
for (;;) {
- u64 base = (u64)(unsigned long)log_base;
+ u64 base = (u64)(ulong)log_base;
u64 log = base + write_page / 8;
int bit = write_page % 8;
- if ((u64)(unsigned long)log != log)
+ if ((u64)(ulong)log != log)
return -EFAULT;
- r = set_bit_to_user(bit, (void __user *)(unsigned long)log);
+ r = set_bit_to_user(bit, (void __user *)(ulong)log);
if (r < 0)
return r;
if (write_length <= VHOST_PAGE_SIZE)
@@ -1003,16 +1012,16 @@ int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
static int vhost_update_used_flags(struct vhost_virtqueue *vq)
{
void __user *used;
- if (__put_user(vq->used_flags, &vq->used->flags) < 0)
+ if (__put_user(vq->used_flags, &vq->hst.vr.used->flags) < 0)
return -EFAULT;
if (unlikely(vq->log_used)) {
/* Make sure the flag is seen before log. */
smp_wmb();
/* Log used flag write. */
- used = &vq->used->flags;
+ used = &vq->hst.vr.used->flags;
log_write(vq->log_base, vq->log_addr +
- (used - (void __user *)vq->used),
- sizeof vq->used->flags);
+ (used - (void __user *)vq->hst.vr.used),
+ sizeof vq->hst.vr.used->flags);
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
@@ -1021,7 +1030,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
{
- if (__put_user(vq->avail_idx, vhost_avail_event(vq)))
+ if (__put_user(vq->hst.avail_idx, vhost_avail_event(vq)))
return -EFAULT;
if (unlikely(vq->log_used)) {
void __user *used;
@@ -1030,7 +1039,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
/* Log avail event write */
used = vhost_avail_event(vq);
log_write(vq->log_base, vq->log_addr +
- (used - (void __user *)vq->used),
+ (used - (void __user *)vq->hst.vr.used),
sizeof *vhost_avail_event(vq));
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
@@ -1048,7 +1057,7 @@ int vhost_init_used(struct vhost_virtqueue *vq)
if (r)
return r;
vq->signalled_used_valid = false;
- return get_user(vq->last_used_idx, &vq->used->idx);
+ return get_user(vq->hst.last_used_idx, &vq->hst.vr.used->idx);
}
static int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
@@ -1077,7 +1086,7 @@ static int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
_iov = iov + ret;
size = reg->memory_size - addr + reg->guest_phys_addr;
_iov->iov_len = min((u64)len, size);
- _iov->iov_base = (void __user *)(unsigned long)
+ _iov->iov_base = (void __user *)(ulong)
(reg->userspace_addr + addr - reg->guest_phys_addr);
s += size;
addr += size;
@@ -1216,22 +1225,23 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
int ret;
/* Check it isn't doing very strange things with descriptor numbers. */
- last_avail_idx = vq->last_avail_idx;
- if (unlikely(__get_user(vq->avail_idx, &vq->avail->idx))) {
+ last_avail_idx = vq->hst.last_avail_idx;
+ if (unlikely(__get_user(vq->hst.avail_idx, &vq->hst.vr.avail->idx))) {
vq_err(vq, "Failed to access avail idx at %p\n",
- &vq->avail->idx);
+ &vq->hst.vr.avail->idx);
return -EFAULT;
}
- if (unlikely((u16)(vq->avail_idx - last_avail_idx) > vq->num)) {
+ if (unlikely((u16)(vq->hst.avail_idx -
+ last_avail_idx) > vq->hst.vr.num)) {
vq_err(vq, "Guest moved used index from %u to %u",
- last_avail_idx, vq->avail_idx);
+ last_avail_idx, vq->hst.avail_idx);
return -EFAULT;
}
/* If there's nothing new since last we looked, return invalid. */
- if (vq->avail_idx == last_avail_idx)
- return vq->num;
+ if (vq->hst.avail_idx == last_avail_idx)
+ return vq->hst.vr.num;
/* Only get avail ring entries after they have been exposed by guest. */
smp_rmb();
@@ -1239,17 +1249,19 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
/* Grab the next descriptor number they're advertising, and increment
* the index we've seen. */
if (unlikely(__get_user(head,
- &vq->avail->ring[last_avail_idx % vq->num]))) {
+ &vq->hst.vr.avail->ring[last_avail_idx %
+ vq->hst.vr.num]))) {
vq_err(vq, "Failed to read head: idx %d address %p\n",
last_avail_idx,
- &vq->avail->ring[last_avail_idx % vq->num]);
+ &vq->hst.vr.avail->ring[last_avail_idx %
+ vq->hst.vr.num]);
return -EFAULT;
}
/* If their number is silly, that's an error. */
- if (unlikely(head >= vq->num)) {
+ if (unlikely(head >= vq->hst.vr.num)) {
vq_err(vq, "Guest says index %u > %u is available",
- head, vq->num);
+ head, vq->hst.vr.num);
return -EINVAL;
}
@@ -1261,21 +1273,21 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
i = head;
do {
unsigned iov_count = *in_num + *out_num;
- if (unlikely(i >= vq->num)) {
+ if (unlikely(i >= vq->hst.vr.num)) {
vq_err(vq, "Desc index is %u > %u, head = %u",
- i, vq->num, head);
+ i, vq->hst.vr.num, head);
return -EINVAL;
}
- if (unlikely(++found > vq->num)) {
+ if (unlikely(++found > vq->hst.vr.num)) {
vq_err(vq, "Loop detected: last one at %u "
"vq size %u head %u\n",
- i, vq->num, head);
+ i, vq->hst.vr.num, head);
return -EINVAL;
}
- ret = __copy_from_user(&desc, vq->desc + i, sizeof desc);
+ ret = __copy_from_user(&desc, vq->hst.vr.desc + i, sizeof desc);
if (unlikely(ret)) {
vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
- i, vq->desc + i);
+ i, vq->hst.vr.desc + i);
return -EFAULT;
}
if (desc.flags & VRING_DESC_F_INDIRECT) {
@@ -1319,7 +1331,7 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
} while ((i = next_desc(&desc)) != -1);
/* On success, increment avail index. */
- vq->last_avail_idx++;
+ vq->hst.last_avail_idx++;
/* Assume notifications from guest are disabled at this point,
* if they aren't we would need to update avail_event index. */
@@ -1330,7 +1342,7 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
{
- vq->last_avail_idx -= n;
+ vq->hst.last_avail_idx -= n;
}
/* After we've used one of their buffers, we tell them about it. We'll then
@@ -1341,7 +1353,7 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
/* The virtqueue contains a ring of used buffers. Get a pointer to the
* next entry in that used ring. */
- used = &vq->used->ring[vq->last_used_idx % vq->num];
+ used = &vq->hst.vr.used->ring[vq->hst.last_used_idx % vq->hst.vr.num];
if (__put_user(head, &used->id)) {
vq_err(vq, "Failed to write used id");
return -EFAULT;
@@ -1352,7 +1364,7 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
}
/* Make sure buffer is written before we update index. */
smp_wmb();
- if (__put_user(vq->last_used_idx + 1, &vq->used->idx)) {
+ if (__put_user(vq->hst.last_used_idx + 1, &vq->hst.vr.used->idx)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
@@ -1362,21 +1374,22 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
/* Log used ring entry write. */
log_write(vq->log_base,
vq->log_addr +
- ((void __user *)used - (void __user *)vq->used),
+ ((void __user *)used -
+ (void __user *)vq->hst.vr.used),
sizeof *used);
/* Log used index update. */
log_write(vq->log_base,
vq->log_addr + offsetof(struct vring_used, idx),
- sizeof vq->used->idx);
+ sizeof vq->hst.vr.used->idx);
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
- vq->last_used_idx++;
+ vq->hst.last_used_idx++;
/* If the driver never bothers to signal in a very long while,
* used index might wrap around. If that happens, invalidate
* signalled_used index we stored. TODO: make sure driver
* signals at least once in 2^16 and remove this. */
- if (unlikely(vq->last_used_idx == vq->signalled_used))
+ if (unlikely(vq->hst.last_used_idx == vq->signalled_used))
vq->signalled_used_valid = false;
return 0;
}
@@ -1389,8 +1402,8 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
u16 old, new;
int start;
- start = vq->last_used_idx % vq->num;
- used = vq->used->ring + start;
+ start = vq->hst.last_used_idx % vq->hst.vr.num;
+ used = vq->hst.vr.used->ring + start;
if (__copy_to_user(used, heads, count * sizeof *used)) {
vq_err(vq, "Failed to write used");
return -EFAULT;
@@ -1401,11 +1414,12 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
/* Log used ring entry write. */
log_write(vq->log_base,
vq->log_addr +
- ((void __user *)used - (void __user *)vq->used),
+ ((void __user *)used -
+ (void __user *)vq->hst.vr.used),
count * sizeof *used);
}
- old = vq->last_used_idx;
- new = (vq->last_used_idx += count);
+ old = vq->hst.last_used_idx;
+ new = (vq->hst.last_used_idx += count);
/* If the driver never bothers to signal in a very long while,
* used index might wrap around. If that happens, invalidate
* signalled_used index we stored. TODO: make sure driver
@@ -1422,8 +1436,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
{
int start, n, r;
- start = vq->last_used_idx % vq->num;
- n = vq->num - start;
+ start = vq->hst.last_used_idx % vq->hst.vr.num;
+ n = vq->hst.vr.num - start;
if (n < count) {
r = __vhost_add_used_n(vq, heads, n);
if (r < 0)
@@ -1435,7 +1449,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
/* Make sure buffer is written before we update index. */
smp_wmb();
- if (put_user(vq->last_used_idx, &vq->used->idx)) {
+ if (put_user(vq->hst.last_used_idx, &vq->hst.vr.used->idx)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
@@ -1443,7 +1457,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
/* Log used index update. */
log_write(vq->log_base,
vq->log_addr + offsetof(struct vring_used, idx),
- sizeof vq->used->idx);
+ sizeof vq->hst.vr.used->idx);
if (vq->log_ctx)
eventfd_signal(vq->log_ctx, 1);
}
@@ -1460,12 +1474,12 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
smp_mb();
if (vhost_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) &&
- unlikely(vq->avail_idx == vq->last_avail_idx))
+ unlikely(vq->hst.avail_idx == vq->hst.last_avail_idx))
return true;
if (!vhost_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) {
__u16 flags;
- if (__get_user(flags, &vq->avail->flags)) {
+ if (__get_user(flags, &vq->hst.vr.avail->flags)) {
vq_err(vq, "Failed to get flags");
return true;
}
@@ -1473,7 +1487,7 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
}
old = vq->signalled_used;
v = vq->signalled_used_valid;
- new = vq->signalled_used = vq->last_used_idx;
+ new = vq->signalled_used = vq->hst.last_used_idx;
vq->signalled_used_valid = true;
if (unlikely(!v))
@@ -1525,13 +1539,14 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
r = vhost_update_used_flags(vq);
if (r) {
vq_err(vq, "Failed to enable notification at %p: %d\n",
- &vq->used->flags, r);
+ &vq->hst.vr.used->flags, r);
return false;
}
} else {
- r = vhost_update_avail_event(vq, vq->avail_idx);
+ r = vhost_update_avail_event(vq, vq->hst.avail_idx);
if (r) {
- vq_err(vq, "Failed to update avail event index at %p: %d\n",
+ vq_err(vq,
+ "Failed to update avail event index at %p: %d\n",
vhost_avail_event(vq), r);
return false;
}
@@ -1539,14 +1554,14 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
/* They could have slipped one in as we were doing that: make
* sure it's written, then check again. */
smp_mb();
- r = __get_user(avail_idx, &vq->avail->idx);
+ r = __get_user(avail_idx, &vq->hst.vr.avail->idx);
if (r) {
vq_err(vq, "Failed to check avail idx at %p: %d\n",
- &vq->avail->idx, r);
+ &vq->hst.vr.avail->idx, r);
return false;
}
- return avail_idx != vq->avail_idx;
+ return avail_idx != vq->hst.avail_idx;
}
/* We don't need to be notified again. */
@@ -1561,7 +1576,7 @@ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
r = vhost_update_used_flags(vq);
if (r)
vq_err(vq, "Failed to enable notification at %p: %d\n",
- &vq->used->flags, r);
+ &vq->hst.vr.used->flags, r);
}
}
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 1125af3..4ab8c8f 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -76,10 +76,7 @@ struct vhost_virtqueue {
/* The actual ring of buffers. */
struct mutex mutex;
- unsigned int num;
- struct vring_desc __user *desc;
- struct vring_avail __user *avail;
- struct vring_used __user *used;
+ struct vring_host hst;
struct file *kick;
struct file *call;
struct file *error;
@@ -92,15 +89,6 @@ struct vhost_virtqueue {
/* The routine to call when the Guest pings us, or timeout. */
vhost_work_fn_t handle_kick;
- /* Last available index we saw. */
- u16 last_avail_idx;
-
- /* Caches available index value from user. */
- u16 avail_idx;
-
- /* Last index we used. */
- u16 last_used_idx;
-
/* Used flags */
u16 used_flags;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 63c6ea1..7917dac 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -7,6 +7,19 @@
struct virtio_device;
struct virtqueue;
+struct vring_host {
+ struct vring vr;
+
+ /* Last available index we saw. */
+ u16 last_avail_idx;
+
+ /* Caches available index value from user. */
+ u16 avail_idx;
+
+ /* Last index we used. */
+ u16 last_used_idx;
+};
+
struct virtqueue *vring_new_virtqueue(unsigned int index,
unsigned int num,
unsigned int vring_align,
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 02/12] vhost: Isolate reusable vring related functions
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
2012-12-05 14:36 ` [RFCv2 01/12] vhost: Use struct vring in vhost_virtqueue Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 03/12] virtio-ring: Introduce file virtio_ring_host Sjur Brændeland
` (10 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Prepare for moving virtio ring code out to a separate
file by isolating vring related functions. The function
vring_add_used_user() and vring_avail_desc_user() that
are handling virtio rings from user space are prepared
to be moved out.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/vhost/vhost.c | 119 ++++++++++++++++++++++++++++--------------------
1 files changed, 69 insertions(+), 50 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 0a676f1..5e91048 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1100,7 +1100,7 @@ static int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
/* Each buffer in the virtqueues is actually a chain of descriptors. This
* function returns the next descriptor in the chain,
* or -1U if we're at the end. */
-static unsigned next_desc(struct vring_desc *desc)
+unsigned vring_next_desc(struct vring_desc *desc)
{
unsigned int next;
@@ -1202,46 +1202,29 @@ static int get_indirect(struct vhost_dev *dev, struct vhost_virtqueue *vq,
}
*out_num += ret;
}
- } while ((i = next_desc(&desc)) != -1);
+ } while ((i = vring_next_desc(&desc)) != -1);
return 0;
}
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access. Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found. A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
- struct iovec iov[], unsigned int iov_size,
- unsigned int *out_num, unsigned int *in_num,
- struct vhost_log *log, unsigned int *log_num)
+static int vring_avail_desc_user(struct vring_host *vh)
{
- struct vring_desc desc;
- unsigned int i, head, found = 0;
+ int head;
u16 last_avail_idx;
- int ret;
/* Check it isn't doing very strange things with descriptor numbers. */
- last_avail_idx = vq->hst.last_avail_idx;
- if (unlikely(__get_user(vq->hst.avail_idx, &vq->hst.vr.avail->idx))) {
- vq_err(vq, "Failed to access avail idx at %p\n",
- &vq->hst.vr.avail->idx);
+ last_avail_idx = vh->last_avail_idx;
+ if (unlikely(__get_user(vh->avail_idx, &vh->vr.avail->idx))) {
+ pr_debug("Failed to access avail idx at %p\n",
+ &vh->vr.avail->idx);
return -EFAULT;
}
- if (unlikely((u16)(vq->hst.avail_idx -
- last_avail_idx) > vq->hst.vr.num)) {
- vq_err(vq, "Guest moved used index from %u to %u",
- last_avail_idx, vq->hst.avail_idx);
+ if (unlikely((u16)(vh->avail_idx - last_avail_idx) > vh->vr.num))
return -EFAULT;
- }
/* If there's nothing new since last we looked, return invalid. */
- if (vq->hst.avail_idx == last_avail_idx)
- return vq->hst.vr.num;
+ if (vh->avail_idx == last_avail_idx)
+ return vh->vr.num;
/* Only get avail ring entries after they have been exposed by guest. */
smp_rmb();
@@ -1249,22 +1232,46 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
/* Grab the next descriptor number they're advertising, and increment
* the index we've seen. */
if (unlikely(__get_user(head,
- &vq->hst.vr.avail->ring[last_avail_idx %
- vq->hst.vr.num]))) {
- vq_err(vq, "Failed to read head: idx %d address %p\n",
- last_avail_idx,
- &vq->hst.vr.avail->ring[last_avail_idx %
- vq->hst.vr.num]);
+ &vh->vr.avail->ring[last_avail_idx %
+ vh->vr.num]))) {
+ pr_debug("Failed to read head: idx %d address %p\n",
+ last_avail_idx,
+ &vh->vr.avail->ring[last_avail_idx %
+ vh->vr.num]);
return -EFAULT;
}
/* If their number is silly, that's an error. */
- if (unlikely(head >= vq->hst.vr.num)) {
- vq_err(vq, "Guest says index %u > %u is available",
- head, vq->hst.vr.num);
+ if (unlikely(head >= vh->vr.num)) {
+ pr_debug("Guest says index %u > %u is available",
+ head, vh->vr.num);
return -EINVAL;
}
+ return head;
+}
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access. Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found. A negative code is
+ * returned on error. */
+int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
+ struct iovec iov[], unsigned int iov_size,
+ unsigned int *out_num, unsigned int *in_num,
+ struct vhost_log *log, unsigned int *log_num)
+{
+ struct vring_desc desc;
+ unsigned int i, head, found = 0;
+ int ret;
+
+ head = vring_avail_desc_user(&vq->hst);
+ if (head < 0)
+ vq_err(vq, "virtqueue_next_avail failed\n");
+
/* When we start there are none of either input nor output. */
*out_num = *in_num = 0;
if (unlikely(log))
@@ -1328,7 +1335,7 @@ int vhost_get_vq_desc(struct vhost_dev *dev, struct vhost_virtqueue *vq,
}
*out_num += ret;
}
- } while ((i = next_desc(&desc)) != -1);
+ } while ((i = vring_next_desc(&desc)) != -1);
/* On success, increment avail index. */
vq->hst.last_avail_idx++;
@@ -1345,29 +1352,41 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
vq->hst.last_avail_idx -= n;
}
-/* After we've used one of their buffers, we tell them about it. We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
+ unsigned int head, int len)
{
- struct vring_used_elem __user *used;
+ struct vring_used_elem *used;
/* The virtqueue contains a ring of used buffers. Get a pointer to the
* next entry in that used ring. */
- used = &vq->hst.vr.used->ring[vq->hst.last_used_idx % vq->hst.vr.num];
+ used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
if (__put_user(head, &used->id)) {
- vq_err(vq, "Failed to write used id");
- return -EFAULT;
+ pr_debug("Failed to write used id");
+ return NULL;
}
if (__put_user(len, &used->len)) {
- vq_err(vq, "Failed to write used len");
- return -EFAULT;
+ pr_debug("Failed to write used len");
+ return NULL;
}
/* Make sure buffer is written before we update index. */
smp_wmb();
- if (__put_user(vq->hst.last_used_idx + 1, &vq->hst.vr.used->idx)) {
- vq_err(vq, "Failed to increment used idx");
- return -EFAULT;
+ if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
+ pr_debug("Failed to increment used idx");
+ return NULL;
}
+ vh->last_used_idx++;
+ return used;
+}
+
+/* After we've used one of their buffers, we tell them about it. We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+{
+ struct vring_used_elem *used;
+ used = vring_add_used_user(&vq->hst, head, len);
+ if (!used)
+ vq_err(vq, "Failed to write to vring");
+
if (unlikely(vq->log_used)) {
/* Make sure data is seen before log. */
smp_wmb();
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 03/12] virtio-ring: Introduce file virtio_ring_host
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
2012-12-05 14:36 ` [RFCv2 01/12] vhost: Use struct vring in vhost_virtqueue Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 02/12] vhost: Isolate reusable vring related functions Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory Sjur Brændeland
` (9 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Move host-side virtio ring functions to file virtio_ring_host.c
The functions vring_avail_desc_user(), vring_add_used_user() and
vring_next_desc() are moved from vhost.c to the new file
virtio_ring_host.c. (The functions are copied as is without any changes)
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/vhost/Kconfig | 2 +
drivers/vhost/vhost.c | 92 -----------------------------
drivers/virtio/Kconfig | 3 +
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_ring_host.c | 117 +++++++++++++++++++++++++++++++++++++
include/linux/virtio_ring.h | 8 +++
6 files changed, 131 insertions(+), 92 deletions(-)
create mode 100644 drivers/virtio/virtio_ring_host.c
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..5bfdaa9 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,4 +1,6 @@
config VHOST_NET
+ select VIRTIO_RING_HOST
+ select VIRTIO
tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
---help---
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5e91048..6634f0a 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1097,27 +1097,6 @@ static int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
return ret;
}
-/* Each buffer in the virtqueues is actually a chain of descriptors. This
- * function returns the next descriptor in the chain,
- * or -1U if we're at the end. */
-unsigned vring_next_desc(struct vring_desc *desc)
-{
- unsigned int next;
-
- /* If this descriptor says it doesn't chain, we're done. */
- if (!(desc->flags & VRING_DESC_F_NEXT))
- return -1U;
-
- /* Check they're not leading us off end of descriptors. */
- next = desc->next;
- /* Make sure compiler knows to grab that: we don't want it changing! */
- /* We will use the result as an index in an array, so most
- * architectures only need a compiler barrier here. */
- read_barrier_depends();
-
- return next;
-}
-
static int get_indirect(struct vhost_dev *dev, struct vhost_virtqueue *vq,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num,
@@ -1206,51 +1185,6 @@ static int get_indirect(struct vhost_dev *dev, struct vhost_virtqueue *vq,
return 0;
}
-static int vring_avail_desc_user(struct vring_host *vh)
-{
- int head;
- u16 last_avail_idx;
-
- /* Check it isn't doing very strange things with descriptor numbers. */
- last_avail_idx = vh->last_avail_idx;
- if (unlikely(__get_user(vh->avail_idx, &vh->vr.avail->idx))) {
- pr_debug("Failed to access avail idx at %p\n",
- &vh->vr.avail->idx);
- return -EFAULT;
- }
-
- if (unlikely((u16)(vh->avail_idx - last_avail_idx) > vh->vr.num))
- return -EFAULT;
-
- /* If there's nothing new since last we looked, return invalid. */
- if (vh->avail_idx == last_avail_idx)
- return vh->vr.num;
-
- /* Only get avail ring entries after they have been exposed by guest. */
- smp_rmb();
-
- /* Grab the next descriptor number they're advertising, and increment
- * the index we've seen. */
- if (unlikely(__get_user(head,
- &vh->vr.avail->ring[last_avail_idx %
- vh->vr.num]))) {
- pr_debug("Failed to read head: idx %d address %p\n",
- last_avail_idx,
- &vh->vr.avail->ring[last_avail_idx %
- vh->vr.num]);
- return -EFAULT;
- }
-
- /* If their number is silly, that's an error. */
- if (unlikely(head >= vh->vr.num)) {
- pr_debug("Guest says index %u > %u is available",
- head, vh->vr.num);
- return -EINVAL;
- }
-
- return head;
-}
-
/* This looks in the virtqueue and for the first available buffer, and converts
* it to an iovec for convenient access. Since descriptors consist of some
* number of output then some number of input descriptors, it's actually two
@@ -1352,32 +1286,6 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
vq->hst.last_avail_idx -= n;
}
-struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
- unsigned int head, int len)
-{
- struct vring_used_elem *used;
-
- /* The virtqueue contains a ring of used buffers. Get a pointer to the
- * next entry in that used ring. */
- used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
- if (__put_user(head, &used->id)) {
- pr_debug("Failed to write used id");
- return NULL;
- }
- if (__put_user(len, &used->len)) {
- pr_debug("Failed to write used len");
- return NULL;
- }
- /* Make sure buffer is written before we update index. */
- smp_wmb();
- if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
- pr_debug("Failed to increment used idx");
- return NULL;
- }
- vh->last_used_idx++;
- return used;
-}
-
/* After we've used one of their buffers, we tell them about it. We'll then
* want to notify the guest, using eventfd. */
int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..4e72892 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,9 @@ config VIRTIO
bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
CONFIG_RPMSG or CONFIG_S390_GUEST.
+config VIRTIO_RING_HOST
+ tristate
+
menu "Virtio drivers"
config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..54831f4 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
+obj-$(CONFIG_VIRTIO_RING_HOST) += virtio_ring_host.o
obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
diff --git a/drivers/virtio/virtio_ring_host.c b/drivers/virtio/virtio_ring_host.c
new file mode 100644
index 0000000..192b838
--- /dev/null
+++ b/drivers/virtio/virtio_ring_host.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2012
+ * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2006 Rusty Russell IBM Corporation
+ * Copyring
+ *
+ * Author: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Copied from vhost.c, author Michael S. Tsirkin <mst@redhat.com>
+ *
+ * License terms: GNU General Public License (GPL) version 2.
+ *
+ * Inspiration, some code, and most witty comments come from
+ * Documentation/virtual/lguest/lguest.c, by Rusty Russell
+ *
+ * Generic code for virtio server in host kernel.
+ */
+#include <linux/virtio.h>
+#include <linux/virtio_ring.h>
+#include <linux/module.h>
+#include <linux/uaccess.h>
+
+MODULE_LICENSE("GPL");
+
+struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
+ unsigned int head, int len)
+{
+ struct vring_used_elem *used;
+
+ /* The virtqueue contains a ring of used buffers. Get a pointer to the
+ * next entry in that used ring. */
+ used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
+ if (__put_user(head, &used->id)) {
+ pr_debug("Failed to write used id");
+ return NULL;
+ }
+ if (__put_user(len, &used->len)) {
+ pr_debug("Failed to write used len");
+ return NULL;
+ }
+ /* Make sure buffer is written before we update index. */
+ smp_wmb();
+ if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
+ pr_debug("Failed to increment used idx");
+ return NULL;
+ }
+ vh->last_used_idx++;
+ return used;
+}
+EXPORT_SYMBOL(vring_add_used_user);
+
+int vring_avail_desc_user(struct vring_host *vh)
+{
+ int head;
+ u16 last_avail_idx;
+
+ /* Check it isn't doing very strange things with descriptor numbers. */
+ last_avail_idx = vh->last_avail_idx;
+ if (unlikely(__get_user(vh->avail_idx, &vh->vr.avail->idx))) {
+ pr_debug("Failed to access avail idx at %p\n",
+ &vh->vr.avail->idx);
+ return -EFAULT;
+ }
+
+ if (unlikely((u16)(vh->avail_idx - last_avail_idx) > vh->vr.num))
+ return -EFAULT;
+
+ /* If there's nothing new since last we looked, return invalid. */
+ if (vh->avail_idx == last_avail_idx)
+ return vh->vr.num;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ smp_rmb();
+
+ /* Grab the next descriptor number they're advertising, and increment
+ * the index we've seen. */
+ if (unlikely(__get_user(head,
+ &vh->vr.avail->ring[last_avail_idx %
+ vh->vr.num]))) {
+ pr_debug("Failed to read head: idx %d address %p\n",
+ last_avail_idx,
+ &vh->vr.avail->ring[last_avail_idx %
+ vh->vr.num]);
+ return -EFAULT;
+ }
+
+ /* If their number is silly, that's an error. */
+ if (unlikely(head >= vh->vr.num)) {
+ pr_debug("Guest says index %u > %u is available",
+ head, vh->vr.num);
+ return -EINVAL;
+ }
+
+ return head;
+}
+EXPORT_SYMBOL(vring_avail_desc_user);
+
+/* Each buffer in the virtqueues is actually a chain of descriptors. This
+ * function returns the next descriptor in the chain,
+ * or -1U if we're at the end. */
+unsigned vring_next_desc(struct vring_desc *desc)
+{
+ unsigned int next;
+
+ /* If this descriptor says it doesn't chain, we're done. */
+ if (!(desc->flags & VRING_DESC_F_NEXT))
+ return -1U;
+
+ /* Check they're not leading us off end of descriptors. */
+ next = desc->next;
+ /* Make sure compiler knows to grab that: we don't want it changing! */
+ /* We will use the result as an index in an array, so most
+ * architectures only need a compiler barrier here. */
+ read_barrier_depends();
+
+ return next;
+}
+EXPORT_SYMBOL(vring_next_desc);
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 7917dac..6c9b871 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -34,4 +34,12 @@ void vring_del_virtqueue(struct virtqueue *vq);
void vring_transport_features(struct virtio_device *vdev);
irqreturn_t vring_interrupt(int irq, void *_vq);
+
+unsigned vring_next_desc(struct vring_desc *desc);
+
+int vring_avail_desc_user(struct vring_host *vh);
+
+struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
+ unsigned int head, int len);
+
#endif /* _LINUX_VIRTIO_RING_H */
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (2 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 03/12] virtio-ring: Introduce file virtio_ring_host Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-06 9:52 ` Michael S. Tsirkin
2012-12-05 14:37 ` [RFCv2 05/12] virtio-ring: Refactor move attributes to struct virtqueue Sjur Brændeland
` (8 subsequent siblings)
12 siblings, 1 reply; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Isolate the access to user-memory in separate inline
functions. This open up for reuse from host-side virtioqueue
implementation accessing virtio-ring in kernel space.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/virtio_ring_host.c | 81 ++++++++++++++++++++++++++-----------
1 files changed, 57 insertions(+), 24 deletions(-)
diff --git a/drivers/virtio/virtio_ring_host.c b/drivers/virtio/virtio_ring_host.c
index 192b838..0750099 100644
--- a/drivers/virtio/virtio_ring_host.c
+++ b/drivers/virtio/virtio_ring_host.c
@@ -18,44 +18,45 @@
#include <linux/virtio_ring.h>
#include <linux/module.h>
#include <linux/uaccess.h>
+#include <linux/kconfig.h>
MODULE_LICENSE("GPL");
-struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
- unsigned int head, int len)
+
+static inline struct vring_used_elem *_vring_add_used(struct vring_host *vh,
+ u32 head, u32 len,
+ bool (*cpy)(void *dst,
+ void *src,
+ size_t s),
+ void (*wbarrier)(void))
{
struct vring_used_elem *used;
+ u16 last_used;
/* The virtqueue contains a ring of used buffers. Get a pointer to the
* next entry in that used ring. */
- used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
- if (__put_user(head, &used->id)) {
- pr_debug("Failed to write used id");
+ used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num - 1)];
+ if (!cpy(&used->id, &head, sizeof(used->id)) ||
+ !cpy(&used->len, &len, sizeof(used->len)))
return NULL;
- }
- if (__put_user(len, &used->len)) {
- pr_debug("Failed to write used len");
+ wbarrier();
+ last_used = vh->last_used_idx + 1;
+ if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used->idx)))
return NULL;
- }
- /* Make sure buffer is written before we update index. */
- smp_wmb();
- if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
- pr_debug("Failed to increment used idx");
- return NULL;
- }
- vh->last_used_idx++;
+ vh->last_used_idx = last_used;
return used;
}
-EXPORT_SYMBOL(vring_add_used_user);
-int vring_avail_desc_user(struct vring_host *vh)
+static inline int _vring_avail_desc(struct vring_host *vh,
+ bool (*get)(u16 *dst, u16 *src),
+ void (*read_barrier)(void))
{
- int head;
+ u16 head;
u16 last_avail_idx;
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vh->last_avail_idx;
- if (unlikely(__get_user(vh->avail_idx, &vh->vr.avail->idx))) {
+ if (unlikely(!get(&vh->avail_idx, &vh->vr.avail->idx))) {
pr_debug("Failed to access avail idx at %p\n",
&vh->vr.avail->idx);
return -EFAULT;
@@ -69,13 +70,12 @@ int vring_avail_desc_user(struct vring_host *vh)
return vh->vr.num;
/* Only get avail ring entries after they have been exposed by guest. */
- smp_rmb();
+ read_barrier();
/* Grab the next descriptor number they're advertising, and increment
* the index we've seen. */
- if (unlikely(__get_user(head,
- &vh->vr.avail->ring[last_avail_idx %
- vh->vr.num]))) {
+ if (unlikely(!get(&head, &vh->vr.avail->ring[last_avail_idx &
+ (vh->vr.num - 1)]))) {
pr_debug("Failed to read head: idx %d address %p\n",
last_avail_idx,
&vh->vr.avail->ring[last_avail_idx %
@@ -92,6 +92,39 @@ int vring_avail_desc_user(struct vring_host *vh)
return head;
}
+
+static inline void smp_write_barrier(void)
+{
+ smp_wmb();
+}
+
+static inline void smp_read_barrier(void)
+{
+ smp_rmb();
+}
+
+static inline bool userspace_cpy_to(void *dst, void *src, size_t s)
+{
+ return __copy_to_user(dst, src, s) == 0;
+}
+
+static inline bool userspace_get(u16 *dst, u16 *src)
+{
+ return __get_user(*dst, src);
+}
+
+struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
+ unsigned int head, int len)
+{
+ return _vring_add_used(vh, head, len, userspace_cpy_to,
+ smp_write_barrier);
+}
+EXPORT_SYMBOL(vring_add_used_user);
+
+int vring_avail_desc_user(struct vring_host *vh)
+{
+ return _vring_avail_desc(vh, userspace_get, smp_read_barrier);
+}
EXPORT_SYMBOL(vring_avail_desc_user);
/* Each buffer in the virtqueues is actually a chain of descriptors. This
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 05/12] virtio-ring: Refactor move attributes to struct virtqueue
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (3 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 06/12] virtio_ring: Move SMP macros to virtio_ring.h Sjur Brændeland
` (7 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Attributes 'weak_barriers' and 'notify' are moved
from struct vring_virtqueue to struct virtqueue.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/virtio_ring.c | 20 +++++++-------------
include/linux/virtio.h | 4 ++++
2 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ffd7e7d..6aa76b4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -31,11 +31,11 @@
* barriers, because mandatory barriers control MMIO effects on accesses
* through relaxed memory I/O windows (which virtio-pci does not use). */
#define virtio_mb(vq) \
- do { if ((vq)->weak_barriers) smp_mb(); else mb(); } while(0)
+ do { if ((vq)->vq.weak_barriers) smp_mb(); else mb(); } while (0)
#define virtio_rmb(vq) \
- do { if ((vq)->weak_barriers) smp_rmb(); else rmb(); } while(0)
+ do { if ((vq)->vq.weak_barriers) smp_rmb(); else rmb(); } while (0)
#define virtio_wmb(vq) \
- do { if ((vq)->weak_barriers) smp_wmb(); else wmb(); } while(0)
+ do { if ((vq)->vq.weak_barriers) smp_wmb(); else wmb(); } while (0)
#else
/* We must force memory ordering even if guest is UP since host could be
* running on another CPU, but SMP barriers are defined to barrier() in that
@@ -81,9 +81,6 @@ struct vring_virtqueue
/* Actual memory layout for this queue */
struct vring vring;
- /* Can we use weak barriers? */
- bool weak_barriers;
-
/* Other side has made a mess, don't try any more. */
bool broken;
@@ -101,9 +98,6 @@ struct vring_virtqueue
/* Last used index we've seen. */
u16 last_used_idx;
- /* How to notify other side. FIXME: commonalize hcalls! */
- void (*notify)(struct virtqueue *vq);
-
#ifdef DEBUG
/* They're supposed to lock for us. */
unsigned int in_use;
@@ -236,7 +230,7 @@ int virtqueue_add_buf(struct virtqueue *_vq,
* there are outgoing parts to the buffer. Presumably the
* host should service the ring ASAP. */
if (out)
- vq->notify(&vq->vq);
+ vq->vq.notify(&vq->vq);
END_USE(vq);
return -ENOSPC;
}
@@ -348,7 +342,7 @@ void virtqueue_notify(struct virtqueue *_vq)
struct vring_virtqueue *vq = to_vvq(_vq);
/* Prod other side to tell it about changes. */
- vq->notify(_vq);
+ vq->vq.notify(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_notify);
@@ -647,8 +641,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
vq->vq.name = name;
vq->vq.num_free = num;
vq->vq.index = index;
- vq->notify = notify;
- vq->weak_barriers = weak_barriers;
+ vq->vq.notify = notify;
+ vq->vq.weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
vq->num_added = 0;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 25fa1a6..f513ba8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -13,11 +13,13 @@
* virtqueue - a queue to register buffers for sending or receiving.
* @list: the chain of virtqueues for this device
* @callback: the function to call when buffers are consumed (can be NULL).
+ * @notify: the function to notify the other side (can be NULL)
* @name: the name of this virtqueue (mainly for debugging)
* @vdev: the virtio device this queue was created for.
* @priv: a pointer for the virtqueue implementation to use.
* @index: the zero-based ordinal number for this queue.
* @num_free: number of elements we expect to be able to fit.
+ * @weak_barriers: indicate if we can use weak memory barriers.
*
* A note on @num_free: with indirect buffers, each buffer needs one
* element in the queue, otherwise a buffer will need one element per
@@ -26,11 +28,13 @@
struct virtqueue {
struct list_head list;
void (*callback)(struct virtqueue *vq);
+ void (*notify)(struct virtqueue *vq);
const char *name;
struct virtio_device *vdev;
unsigned int index;
unsigned int num_free;
void *priv;
+ bool weak_barriers;
};
int virtqueue_add_buf(struct virtqueue *vq,
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 06/12] virtio_ring: Move SMP macros to virtio_ring.h
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (4 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 05/12] virtio-ring: Refactor move attributes to struct virtqueue Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 07/12] virtio-ring: Add Host side virtio-ring implementation Sjur Brændeland
` (6 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Move macros from virtio_ring.c to virtio_ring.h so that
they can be used outside the file.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/virtio_ring.c | 21 ---------------------
include/linux/virtio_ring.h | 20 ++++++++++++++++++++
2 files changed, 20 insertions(+), 21 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6aa76b4..ead47d7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -24,27 +24,6 @@
#include <linux/module.h>
#include <linux/hrtimer.h>
-/* virtio guest is communicating with a virtual "device" that actually runs on
- * a host processor. Memory barriers are used to control SMP effects. */
-#ifdef CONFIG_SMP
-/* Where possible, use SMP barriers which are more lightweight than mandatory
- * barriers, because mandatory barriers control MMIO effects on accesses
- * through relaxed memory I/O windows (which virtio-pci does not use). */
-#define virtio_mb(vq) \
- do { if ((vq)->vq.weak_barriers) smp_mb(); else mb(); } while (0)
-#define virtio_rmb(vq) \
- do { if ((vq)->vq.weak_barriers) smp_rmb(); else rmb(); } while (0)
-#define virtio_wmb(vq) \
- do { if ((vq)->vq.weak_barriers) smp_wmb(); else wmb(); } while (0)
-#else
-/* We must force memory ordering even if guest is UP since host could be
- * running on another CPU, but SMP barriers are defined to barrier() in that
- * configuration. So fall back to mandatory barriers instead. */
-#define virtio_mb(vq) mb()
-#define virtio_rmb(vq) rmb()
-#define virtio_wmb(vq) wmb()
-#endif
-
#ifdef DEBUG
/* For development, we want to crash whenever the ring is screwed. */
#define BAD_RING(_vq, fmt, args...) \
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 6c9b871..1a4023b 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -42,4 +42,24 @@ int vring_avail_desc_user(struct vring_host *vh);
struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
unsigned int head, int len);
+/* virtio guest is communicating with a virtual "device" that actually runs on
+ * a host processor. Memory barriers are used to control SMP effects. */
+#ifdef CONFIG_SMP
+/* Where possible, use SMP barriers which are more lightweight than mandatory
+ * barriers, because mandatory barriers control MMIO effects on accesses
+ * through relaxed memory I/O windows (which virtio-pci does not use). */
+#define virtio_mb(vq) \
+ do { if ((vq)->vq.weak_barriers) smp_mb(); else mb(); } while (0)
+#define virtio_rmb(vq) \
+ do { if ((vq)->vq.weak_barriers) smp_rmb(); else rmb(); } while (0)
+#define virtio_wmb(vq) \
+ do { if ((vq)->vq.weak_barriers) smp_wmb(); else wmb(); } while (0)
+#else
+/* We must force memory ordering even if guest is UP since host could be
+ * running on another CPU, but SMP barriers are defined to barrier() in that
+ * configuration. So fall back to mandatory barriers instead. */
+#define virtio_mb(vq) mb()
+#define virtio_rmb(vq) rmb()
+#define virtio_wmb(vq) wmb()
+#endif
#endif /* _LINUX_VIRTIO_RING_H */
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 07/12] virtio-ring: Add Host side virtio-ring implementation
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (5 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 06/12] virtio_ring: Move SMP macros to virtio_ring.h Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 08/12] virtio: Update vring_interrupt for host-side virtio queues Sjur Brændeland
` (5 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Introduce a host-side virtio queue implementation. The function
vring_new_host_virtqueue(), virtqueue_add_buf_to_used()
virtqueue_next_avail_desc() are added to virtio_ring_host.c
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/virtio_ring_host.c | 195 +++++++++++++++++++++++++++++++++++++
include/linux/virtio.h | 2 +
include/linux/virtio_ring.h | 23 +++++
3 files changed, 220 insertions(+), 0 deletions(-)
diff --git a/drivers/virtio/virtio_ring_host.c b/drivers/virtio/virtio_ring_host.c
index 0750099..570e11e 100644
--- a/drivers/virtio/virtio_ring_host.c
+++ b/drivers/virtio/virtio_ring_host.c
@@ -19,10 +19,32 @@
#include <linux/module.h>
#include <linux/uaccess.h>
#include <linux/kconfig.h>
+#include <linux/slab.h>
MODULE_LICENSE("GPL");
+struct vring_host_virtqueue {
+ struct virtqueue vq;
+
+ /* Actual memory layout for this queue */
+ struct vring_host vring;
+
+ /* Other side has made a mess, don't try any more. */
+ bool broken;
+};
+
+#define to_vvq(_vq) container_of(_vq, struct vring_host_virtqueue, vq)
+
+#define BAD_RING(_vq, fmt, args...) \
+ do { \
+ dev_err(&_vq->vq.vdev->dev, \
+ "%s:"fmt, (_vq)->vq.name, ##args); \
+ (_vq)->broken = true; \
+ } while (0)
+#define START_USE(vq)
+#define END_USE(vq)
+
static inline struct vring_used_elem *_vring_add_used(struct vring_host *vh,
u32 head, u32 len,
bool (*cpy)(void *dst,
@@ -148,3 +170,176 @@ unsigned vring_next_desc(struct vring_desc *desc)
return next;
}
EXPORT_SYMBOL(vring_next_desc);
+
+struct virtqueue *vring_new_host_virtqueue(unsigned int index,
+ unsigned int num,
+ unsigned int vring_align,
+ struct virtio_device *vdev,
+ bool weak_barriers,
+ void *pages,
+ void (*notify)(struct virtqueue *),
+ void (*callback)(struct virtqueue *),
+ const char *name)
+{
+ struct vring_host_virtqueue *vq;
+
+ /* We assume num is a power of 2. */
+ if (num & (num - 1)) {
+ dev_warn(&vdev->dev, "Bad virtqueue length %u\n", num);
+ return NULL;
+ }
+
+ vq = kmalloc(sizeof(*vq), GFP_KERNEL);
+ if (!vq)
+ return NULL;
+
+ vring_init(&vq->vring.vr, num, pages, vring_align);
+ vq->vq.callback = callback;
+ vq->vq.vdev = vdev;
+ vq->vq.name = name;
+ vq->vq.num_free = num;
+ vq->vq.index = index;
+ vq->vq.weak_barriers = weak_barriers;
+ vq->vq.notify = notify;
+ vq->broken = false;
+ vq->vq.reversed = true;
+ list_add_tail(&vq->vq.list, &vdev->vqs);
+ /* FIX: What about no callback, should we tell pair not to bother us? */
+ return &vq->vq;
+}
+EXPORT_SYMBOL_GPL(vring_new_host_virtqueue);
+
+static inline bool _kernel_cpy_to(void *dst, void *src, size_t s)
+{
+ memcpy(dst, src, s);
+ return true;
+}
+
+static inline bool _kernel_get(u16 *dst, u16 *src)
+{
+ *dst = *src;
+ return true;
+}
+
+static inline void _read_barrier(void)
+{
+ rmb();
+}
+
+/**
+ * virtqueue_next_avail_desc - get the next available descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor in the ring
+ *
+ * Look for the next available descriptor in the available ring.
+ * Return NULL if nothing new in the available.
+ */
+struct vring_desc *virtqueue_next_avail_desc(struct virtqueue *_vq,
+ int *head)
+{
+ struct vring_host_virtqueue *vq = to_vvq(_vq);
+ struct vring_desc *desc = NULL;
+ int hd = -1;
+
+ BUG_ON(!vq->vq.reversed);
+ if (unlikely(vq->broken))
+ goto out;
+
+ START_USE(vq);
+ virtio_rmb(vq);
+
+ hd = _vring_avail_desc(&vq->vring, _kernel_get, _read_barrier);
+ if (unlikely(hd < 0)) {
+ BAD_RING(vq, "Bad available descriptor avail:%d last:%d\n",
+ vq->vring.avail_idx, vq->vring.last_avail_idx);
+ goto out;
+ }
+ if (likely(hd >= vq->vring.vr.num))
+ goto out;
+
+ desc = &vq->vring.vr.desc[hd];
+ vq->vring.last_avail_idx++;
+out:
+ *head = hd;
+ END_USE(vq);
+ return desc;
+}
+EXPORT_SYMBOL(virtqueue_next_avail_desc);
+
+/*
+ * virtqueue_next_linked_desc - get next linked descriptor from the ring
+ * @_vq: the struct virtqueue we're talking about
+ * @desc: "current" descriptor
+ *
+ * Each buffer in the virtqueues is a chain of descriptors. This
+ * function returns the next descriptor in the chain,* or NULL if we're at
+ * the end.
+ *
+ * Side effect: the function increments vq->last_avail_idx if a non-linked
+ * descriptor is passed as &desc argument.
+ */
+struct vring_desc *virtqueue_next_linked_desc(struct virtqueue *_vq,
+ struct vring_desc *desc)
+{
+ struct vring_host_virtqueue *vq = to_vvq(_vq);
+ unsigned int next;
+
+ BUG_ON(!vq->vq.reversed);
+ START_USE(vq);
+ next = vring_next_desc(desc);
+
+ if (next >= vq->vring.vr.num)
+ desc = NULL;
+ else
+ desc = &vq->vring.vr.desc[next];
+ END_USE(vq);
+ return desc;
+}
+EXPORT_SYMBOL(virtqueue_next_linked_desc);
+
+/*
+ * virtqueue_add_buf_to_used - release a used descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @head: index of the descriptor to be released
+ * @len: number of linked descriptors in a chain
+ *
+ * The function releases a used descriptor in a reversed ring
+ */
+int virtqueue_add_buf_to_used(struct virtqueue *_vq,
+ unsigned int head, int len)
+{
+ struct vring_host_virtqueue *vq = to_vvq(_vq);
+ struct vring_used_elem *used;
+ int used_idx, err = -EINVAL;
+
+ BUG_ON(!vq->vq.reversed);
+ START_USE(vq);
+
+ if (unlikely(vq->broken))
+ goto err;
+
+ if (unlikely(head >= vq->vring.vr.num)) {
+ BAD_RING(vq, "Invalid head index (%u) > max desc idx (%u) ",
+ head, vq->vring.vr.num - 1);
+ goto err;
+ }
+
+ /*
+ * The virtqueue contains a ring of used buffers. Get a pointer to the
+ * next entry in that used ring.
+ */
+ used_idx = (vq->vring.vr.used->idx & (vq->vring.vr.num - 1));
+ used = &vq->vring.vr.used->ring[used_idx];
+ used->id = head;
+ used->len = len;
+
+ /* Make sure buffer is written before we update index. */
+ virtio_wmb(vq);
+ ++vq->vring.vr.used->idx;
+ err = 0;
+err:
+ END_USE(vq);
+ return err;
+
+}
+EXPORT_SYMBOL(virtqueue_add_buf_to_used);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f513ba8..3ec2132 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -20,6 +20,7 @@
* @index: the zero-based ordinal number for this queue.
* @num_free: number of elements we expect to be able to fit.
* @weak_barriers: indicate if we can use weak memory barriers.
+ * @reversed: indicate a reversed direction, i.e. a host-side virtio-ring
*
* A note on @num_free: with indirect buffers, each buffer needs one
* element in the queue, otherwise a buffer will need one element per
@@ -35,6 +36,7 @@ struct virtqueue {
unsigned int num_free;
void *priv;
bool weak_barriers;
+ bool reversed;
};
int virtqueue_add_buf(struct virtqueue *vq,
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 1a4023b..01c0f59 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -42,6 +42,29 @@ int vring_avail_desc_user(struct vring_host *vh);
struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
unsigned int head, int len);
+unsigned vring_next_desc(struct vring_desc *desc);
+struct vring_desc *virtqueue_next_linked_desc(struct virtqueue *_vq,
+ struct vring_desc *desc);
+
+struct vring_desc *virtqueue_next_avail_desc(struct virtqueue *_vq,
+ int *head);
+
+struct virtqueue *vring_new_host_virtqueue(unsigned int index,
+ unsigned int num,
+ unsigned int vring_align,
+ struct virtio_device *vdev,
+ bool weak_barriers,
+ void *pages,
+ void (*notify)(struct virtqueue *),
+ void (*callback)(struct virtqueue *),
+ const char *name);
+
+
+int virtqueue_add_buf_to_used(struct virtqueue *_vq,
+ unsigned int head, int len);
+
+
+
/* virtio guest is communicating with a virtual "device" that actually runs on
* a host processor. Memory barriers are used to control SMP effects. */
#ifdef CONFIG_SMP
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 08/12] virtio: Update vring_interrupt for host-side virtio queues
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (6 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 07/12] virtio-ring: Add Host side virtio-ring implementation Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 09/12] virtio-ring: Add BUG_ON checking on host/guest ring type Sjur Brændeland
` (4 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Linus Walleij, virtualization,
Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Add an inline function for vring_interrupt that can handle
host side virtio queues.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/virtio_ring.c | 4 ++--
include/linux/virtio_ring.h | 13 ++++++++++++-
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ead47d7..67f7bcd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -571,7 +571,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
}
EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
-irqreturn_t vring_interrupt(int irq, void *_vq)
+irqreturn_t __vring_interrupt(int irq, void *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
@@ -589,7 +589,7 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
return IRQ_HANDLED;
}
-EXPORT_SYMBOL_GPL(vring_interrupt);
+EXPORT_SYMBOL_GPL(__vring_interrupt);
struct virtqueue *vring_new_virtqueue(unsigned int index,
unsigned int num,
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index 01c0f59..d0aa046 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -33,7 +33,18 @@ void vring_del_virtqueue(struct virtqueue *vq);
/* Filter out transport-specific feature bits. */
void vring_transport_features(struct virtio_device *vdev);
-irqreturn_t vring_interrupt(int irq, void *_vq);
+irqreturn_t __vring_interrupt(int irq, void *_vq);
+static inline irqreturn_t vring_interrupt(int irq, void *_vq)
+{
+ struct virtqueue *vq = _vq;
+ if (!vq->callback)
+ return IRQ_HANDLED;
+ if (vq->reversed) {
+ vq->callback(vq);
+ return IRQ_HANDLED;
+ }
+ return __vring_interrupt(irq, vq);
+}
unsigned vring_next_desc(struct vring_desc *desc);
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 09/12] virtio-ring: Add BUG_ON checking on host/guest ring type
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (7 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 08/12] virtio: Update vring_interrupt for host-side virtio queues Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 10/12] virtio: Add argument reversed to function find_vqs() Sjur Brændeland
` (3 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Linus Walleij, virtualization,
Sjur Brændeland
From: Sjur Brændeland <sjur.brandeland@stericsson.com>
Add BUG_ON to ensure that the correct virtio queue type is
used for the virtqueue functions.
In addition the function virtqueue_kick_prepare() is changed so
that it always returns true if the virtio-ring is reversed.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/virtio/virtio_ring.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 67f7bcd..a6f38c1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -176,6 +176,7 @@ int virtqueue_add_buf(struct virtqueue *_vq,
START_USE(vq);
+ BUG_ON(vq->vq.reversed);
BUG_ON(data == NULL);
#ifdef DEBUG
@@ -282,6 +283,9 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
u16 new, old;
bool needs_kick;
+ if (_vq->reversed)
+ return true;
+
START_USE(vq);
/* We need to expose available array entries before checking avail
* event. */
@@ -346,6 +350,7 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
{
unsigned int i;
+ BUG_ON(vq->vq.reversed);
/* Clear data ptr. */
vq->data[head] = NULL;
@@ -395,6 +400,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
unsigned int i;
u16 last_used;
+ BUG_ON(vq->vq.reversed);
START_USE(vq);
if (unlikely(vq->broken)) {
@@ -457,6 +463,7 @@ EXPORT_SYMBOL_GPL(virtqueue_get_buf);
void virtqueue_disable_cb(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ BUG_ON(vq->vq.reversed);
vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
}
@@ -477,6 +484,7 @@ bool virtqueue_enable_cb(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ BUG_ON(vq->vq.reversed);
START_USE(vq);
/* We optimistically turn back on interrupts, then check if there was
@@ -515,6 +523,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
struct vring_virtqueue *vq = to_vvq(_vq);
u16 bufs;
+ BUG_ON(vq->vq.reversed);
START_USE(vq);
/* We optimistically turn back on interrupts, then check if there was
@@ -551,6 +560,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
unsigned int i;
void *buf;
+ BUG_ON(vq->vq.reversed);
START_USE(vq);
for (i = 0; i < vq->vring.num; i++) {
@@ -575,6 +585,7 @@ irqreturn_t __vring_interrupt(int irq, void *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
+ BUG_ON(vq->vq.reversed);
if (!more_used(vq)) {
pr_debug("virtqueue interrupt with no work for %p\n", vq);
return IRQ_NONE;
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 10/12] virtio: Add argument reversed to function find_vqs()
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (8 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 09/12] virtio-ring: Add BUG_ON checking on host/guest ring type Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 11/12] remoteproc: Add support for host-virtqueues Sjur Brændeland
` (2 subsequent siblings)
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Add argument 'reversed' to function find_vqs() in order to
allow virtio devices to request host-side virtio-queues.
The argument 'reversed' may be NULL if reversed
virtio-queues are not requested.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/char/virtio_console.c | 3 ++-
drivers/lguest/lguest_device.c | 5 ++++-
drivers/net/virtio_net.c | 3 ++-
drivers/remoteproc/remoteproc_virtio.c | 3 ++-
drivers/rpmsg/virtio_rpmsg_bus.c | 2 +-
drivers/s390/kvm/kvm_virtio.c | 5 ++++-
drivers/scsi/virtio_scsi.c | 2 +-
drivers/virtio/virtio_balloon.c | 3 ++-
drivers/virtio/virtio_mmio.c | 5 ++++-
drivers/virtio/virtio_pci.c | 3 ++-
include/linux/virtio_config.h | 7 +++++--
11 files changed, 29 insertions(+), 12 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 4ad8aca..6c96ec7 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1780,7 +1780,8 @@ static int init_vqs(struct ports_device *portdev)
/* Find the queues. */
err = portdev->vdev->config->find_vqs(portdev->vdev, nr_queues, vqs,
io_callbacks,
- (const char **)io_names);
+ (const char **)io_names,
+ NULL);
if (err)
goto free;
diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index fc92ccb..724e084 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -369,7 +369,8 @@ static void lg_del_vqs(struct virtio_device *vdev)
static int lg_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
- const char *names[])
+ const char *names[],
+ const bool reversed[])
{
struct lguest_device *ldev = to_lgdev(vdev);
int i;
@@ -379,6 +380,8 @@ static int lg_find_vqs(struct virtio_device *vdev, unsigned nvqs,
return -ENOENT;
for (i = 0; i < nvqs; ++i) {
+ if (reversed || reversed[i])
+ goto error;
vqs[i] = lg_find_vq(vdev, i, callbacks[i], names[i]);
if (IS_ERR(vqs[i]))
goto error;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6289891..0eb5b2b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1026,7 +1026,8 @@ static int init_vqs(struct virtnet_info *vi)
* and optionally control. */
nvqs = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ? 3 : 2;
- err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names);
+ err = vi->vdev->config->find_vqs(vi->vdev, nvqs, vqs, callbacks, names,
+ NULL);
if (err)
return err;
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index e7a4780..a825f67 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -140,7 +140,8 @@ static void rproc_virtio_del_vqs(struct virtio_device *vdev)
static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
- const char *names[])
+ const char *names[],
+ bool reversed[])
{
struct rproc *rproc = vdev_to_rproc(vdev);
int i, ret;
diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index 027096f..3b2f727 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -946,7 +946,7 @@ static int rpmsg_probe(struct virtio_device *vdev)
init_waitqueue_head(&vrp->sendq);
/* We expect two virtqueues, rx and tx (and in this order) */
- err = vdev->config->find_vqs(vdev, 2, vqs, vq_cbs, names);
+ err = vdev->config->find_vqs(vdev, 2, vqs, vq_cbs, names, NULL);
if (err)
goto free_vrp;
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 7dabef6..74cc7e8 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -246,7 +246,8 @@ static void kvm_del_vqs(struct virtio_device *vdev)
static int kvm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
- const char *names[])
+ const char *names[],
+ const bool reversed[])
{
struct kvm_device *kdev = to_kvmdev(vdev);
int i;
@@ -256,6 +257,8 @@ static int kvm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
return -ENOENT;
for (i = 0; i < nvqs; ++i) {
+ if (reversed || reversed[i])
+ goto error;
vqs[i] = kvm_find_vq(vdev, i, callbacks[i], names[i]);
if (IS_ERR(vqs[i]))
goto error;
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index d5f9f45..e21660c 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -645,7 +645,7 @@ static int virtscsi_init(struct virtio_device *vdev,
};
/* Discover virtqueues and write information to configuration. */
- err = vdev->config->find_vqs(vdev, 3, vqs, callbacks, names);
+ err = vdev->config->find_vqs(vdev, 3, vqs, callbacks, names, NULL);
if (err)
return err;
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 586395c..3fa01e0 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -315,7 +315,8 @@ static int init_vqs(struct virtio_balloon *vb)
* optionally stat.
*/
nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
- err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
+ err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names,
+ NULL);
if (err)
return err;
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 5a0e1d3..a3d1f03 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -394,7 +394,8 @@ error_available:
static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
- const char *names[])
+ const char *names[],
+ const bool reversed[])
{
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
unsigned int irq = platform_get_irq(vm_dev->pdev, 0);
@@ -406,6 +407,8 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned nvqs,
return err;
for (i = 0; i < nvqs; ++i) {
+ if (reversed || reversed[i])
+ return -EINVAL;
vqs[i] = vm_setup_vq(vdev, i, callbacks[i], names[i]);
if (IS_ERR(vqs[i])) {
vm_del_vqs(vdev);
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index e3ecc94..0ec1be6 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -598,7 +598,8 @@ error_request:
static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
- const char *names[])
+ const char *names[],
+ const bool reversed[])
{
int err;
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 29b9104..a94b94e 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -37,6 +37,8 @@
* include a NULL entry for vqs that do not need a callback
* names: array of virtqueue names (mainly for debugging)
* include a NULL entry for vqs unused by driver
+ * reversed: array of bool indicating reversed host-side rings.
+ * NULL is legal and indicates no reversed rings are used.
* Returns 0 on success or error status
* @del_vqs: free virtqueues found by find_vqs().
* @get_features: get the array of feature bits for this device.
@@ -64,7 +66,8 @@ struct virtio_config_ops {
int (*find_vqs)(struct virtio_device *, unsigned nvqs,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
- const char *names[]);
+ const char *names[],
+ const bool reversed[]);
void (*del_vqs)(struct virtio_device *);
u32 (*get_features)(struct virtio_device *vdev);
void (*finalize_features)(struct virtio_device *vdev);
@@ -130,7 +133,7 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev,
vq_callback_t *callbacks[] = { c };
const char *names[] = { n };
struct virtqueue *vq;
- int err = vdev->config->find_vqs(vdev, 1, &vq, callbacks, names);
+ int err = vdev->config->find_vqs(vdev, 1, &vq, callbacks, names, NULL);
if (err < 0)
return ERR_PTR(err);
return vq;
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 11/12] remoteproc: Add support for host-virtqueues
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (9 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 10/12] virtio: Add argument reversed to function find_vqs() Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 12/12] caif_virtio: Introduce caif over virtio Sjur Brændeland
2012-12-06 10:27 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Michael S. Tsirkin
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization, Sjur Brændeland
Create a virtio-queue in reversed direction if requested
by the virtio device. This is done by calling the function
vring_new_host_virtqueue().
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/remoteproc/remoteproc_virtio.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c
index a825f67..5866b03 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -69,6 +69,7 @@ EXPORT_SYMBOL(rproc_vq_interrupt);
static struct virtqueue *rp_find_vq(struct virtio_device *vdev,
unsigned id,
+ bool revers,
void (*callback)(struct virtqueue *vq),
const char *name)
{
@@ -106,8 +107,14 @@ static struct virtqueue *rp_find_vq(struct virtio_device *vdev,
* Create the new vq, and tell virtio we're not interested in
* the 'weak' smp barriers, since we're talking with a real device.
*/
- vq = vring_new_virtqueue(id, len, rvring->align, vdev, false, addr,
- rproc_virtio_notify, callback, name);
+ if (revers)
+ vq = vring_new_host_virtqueue(id, len, rvring->align, vdev,
+ false, addr, rproc_virtio_notify,
+ callback, name);
+ else
+ vq = vring_new_virtqueue(id, len, rvring->align, vdev, false,
+ addr, rproc_virtio_notify, callback,
+ name);
if (!vq) {
dev_err(dev, "vring_new_virtqueue %s failed\n", name);
rproc_free_vring(rvring);
@@ -145,9 +152,11 @@ static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs,
{
struct rproc *rproc = vdev_to_rproc(vdev);
int i, ret;
+ bool revers;
for (i = 0; i < nvqs; ++i) {
- vqs[i] = rp_find_vq(vdev, i, callbacks[i], names[i]);
+ revers = reversed ? reversed[i] : false;
+ vqs[i] = rp_find_vq(vdev, i, revers, callbacks[i], names[i]);
if (IS_ERR(vqs[i])) {
ret = PTR_ERR(vqs[i]);
goto error;
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [RFCv2 12/12] caif_virtio: Introduce caif over virtio
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (10 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 11/12] remoteproc: Add support for host-virtqueues Sjur Brændeland
@ 2012-12-05 14:37 ` Sjur Brændeland
2012-12-06 10:27 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Michael S. Tsirkin
12 siblings, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2012-12-05 14:37 UTC (permalink / raw)
To: Rusty Russell
Cc: Vikram ARV, Michael S. Tsirkin, Linus Walleij, virtualization,
Sjur Brændeland
From: Vikram ARV <vikram.arv@stericsson.com>
caif_virtio is using the remoteproc and virtio framework
for communicating with the modem. The CAIF link layer
device is registered as a network device.
CAIF over virtio uses the virtio rings in both "directions",
and request a reversed virtio queue in the RX direction.
Signed-off-by: Vikram ARV <vikram.arv@stericsson.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
---
drivers/net/caif/Kconfig | 8 +
drivers/net/caif/Makefile | 3 +
drivers/net/caif/caif_virtio.c | 482 +++++++++++++++++++++++++++++++++++++++
include/linux/virtio_caif.h | 24 ++
include/uapi/linux/virtio_ids.h | 1 +
5 files changed, 518 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/caif/caif_virtio.c
create mode 100644 include/linux/virtio_caif.h
diff --git a/drivers/net/caif/Kconfig b/drivers/net/caif/Kconfig
index abf4d7a..a1279d5 100644
--- a/drivers/net/caif/Kconfig
+++ b/drivers/net/caif/Kconfig
@@ -47,3 +47,11 @@ config CAIF_HSI
The caif low level driver for CAIF over HSI.
Be aware that if you enable this then you also need to
enable a low-level HSI driver.
+
+config CAIF_VIRTIO
+ tristate "CAIF virtio transport driver"
+ depends on CAIF
+ depends on VIRTIO
+ default m
+ ---help---
+ The caif driver for CAIF over Virtio.
diff --git a/drivers/net/caif/Makefile b/drivers/net/caif/Makefile
index 91dff86..d9ee26a 100644
--- a/drivers/net/caif/Makefile
+++ b/drivers/net/caif/Makefile
@@ -13,3 +13,6 @@ obj-$(CONFIG_CAIF_SHM) += caif_shm.o
# HSI interface
obj-$(CONFIG_CAIF_HSI) += caif_hsi.o
+
+# Virtio interface
+obj-$(CONFIG_CAIF_VIRTIO) += caif_virtio.o
diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
new file mode 100644
index 0000000..94efd21
--- /dev/null
+++ b/drivers/net/caif/caif_virtio.c
@@ -0,0 +1,482 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2012
+ * Contact: Sjur Brendeland / sjur.brandeland@stericsson.com
+ * Authors: Vicram Arv / vikram.arv@stericsson.com,
+ * Dmitry Tarnyagin / dmitry.tarnyagin@stericsson.com
+ * Sjur Brendeland / sjur.brandeland@stericsson.com
+ * License terms: GNU General Public License (GPL) version 2
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ":%s/%d" fmt, __func__, __LINE__
+#include <linux/module.h>
+#include <linux/virtio.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/dma-mapping.h>
+#include <linux/netdevice.h>
+#include <linux/if_arp.h>
+#include <linux/spinlock.h>
+#include <linux/virtio_caif.h>
+#include <linux/virtio_ring.h>
+#include <net/caif/caif_dev.h>
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Vicram Arv <vikram.arv@stericsson.com>");
+MODULE_DESCRIPTION("Virtio CAIF Driver");
+
+/*
+ * struct cfv_info - Caif Virtio control structure
+ * @cfdev: caif common header
+ * @vdev: Associated virtio device
+ * @vq_rx: rx/downlink virtqueue
+ * @vq_tx: tx/uplink virtqueue
+ * @ndev: associated netdevice
+ * @queued_tx: number of buffers queued in the tx virtqueue
+ * @watermark_tx: indicates number of buffers the tx queue
+ * should shrink to to unblock datapath
+ * @tx_lock: protects vq_tx to allow concurrent senders
+ * @tx_hr: transmit headroom
+ * @rx_hr: receive headroom
+ * @tx_tr: transmit tailroom
+ * @rx_tr: receive tailroom
+ * @mtu: transmit max size
+ * @mru: receive max size
+ */
+struct cfv_info {
+ struct caif_dev_common cfdev;
+ struct virtio_device *vdev;
+ struct virtqueue *vq_rx;
+ struct virtqueue *vq_tx;
+ struct net_device *ndev;
+ unsigned int queued_tx;
+ unsigned int watermark_tx;
+ /* Protect access to vq_tx */
+ spinlock_t tx_lock;
+ /* Copied from Virtio config space */
+ u16 tx_hr;
+ u16 rx_hr;
+ u16 tx_tr;
+ u16 rx_tr;
+ u32 mtu;
+ u32 mru;
+};
+
+/*
+ * struct token_info - maintains Transmit buffer data handle
+ * @size: size of transmit buffer
+ * @dma_handle: handle to allocated dma device memory area
+ * @vaddr: virtual address mapping to allocated memory area
+ */
+struct token_info {
+ size_t size;
+ u8 *vaddr;
+ dma_addr_t dma_handle;
+};
+
+/* Default if virtio config space is unavailable */
+#define CFV_DEF_MTU_SIZE 4096
+#define CFV_DEF_HEADROOM 0
+#define CFV_DEF_TAILROOM 0
+
+/* Require IP header to be 4-byte aligned. */
+#define IP_HDR_ALIGN 4
+
+/*
+ * virtqueue_next_desc - get next available or linked descriptor
+ * @_vq: the struct virtqueue we're talking about
+ * @desc: "current" descriptor.
+ * @head: on return it is filled by the descriptor index in case of
+ * available descriptor was returned, or -1 in case of linked
+ * descriptor.
+ *
+ * The function is to be used as an iterator through received descriptors.
+ */
+static struct vring_desc *virtqueue_next_desc(struct virtqueue *_vq,
+ struct vring_desc *desc,
+ int *head)
+{
+ struct vring_desc *next = virtqueue_next_linked_desc(_vq, desc);
+ BUG_ON(!_vq->reversed);
+ if (next == NULL) {
+ virtqueue_add_buf_to_used(_vq, *head, 0);
+ /* tell the remote processor to recycle buffer */
+ virtqueue_kick(_vq);
+ next = virtqueue_next_avail_desc(_vq, head);
+ }
+ return next;
+}
+
+/*
+ * This is invoked whenever the remote processor completed processing
+ * a TX msg we just sent it, and the buffer is put back to the used ring.
+ */
+static void cfv_release_used_buf(struct virtqueue *vq_tx)
+{
+ struct cfv_info *cfv = vq_tx->vdev->priv;
+
+ BUG_ON(vq_tx->reversed);
+ BUG_ON(vq_tx != cfv->vq_tx);
+
+ for (;;) {
+ unsigned int len;
+ struct token_info *buf_info;
+
+ /* Get used buffer from used ring to recycle used descriptors */
+ spin_lock_bh(&cfv->tx_lock);
+ buf_info = virtqueue_get_buf(vq_tx, &len);
+
+ if (!buf_info)
+ goto out;
+
+ BUG_ON(!cfv->queued_tx);
+ if (--cfv->queued_tx < cfv->watermark_tx) {
+ cfv->watermark_tx = 0;
+ netif_tx_wake_all_queues(cfv->ndev);
+ }
+ spin_unlock_bh(&cfv->tx_lock);
+
+ dma_free_coherent(vq_tx->vdev->dev.parent->parent,
+ buf_info->size, buf_info->vaddr,
+ buf_info->dma_handle);
+ kfree(buf_info);
+ }
+ return;
+out:
+ spin_unlock_bh(&cfv->tx_lock);
+}
+
+static int cfv_read_desc(struct vring_desc *d,
+ void **buf, size_t *size)
+{
+ if (d->flags & VRING_DESC_F_INDIRECT) {
+ pr_warn("Indirect descriptor not supported by CAIF\n");
+ return -EINVAL;
+ }
+
+ *buf = phys_to_virt(d->addr);
+ *size = d->len;
+ return 0;
+}
+
+static struct sk_buff *cfv_alloc_and_copy_skb(struct cfv_info *cfv,
+ u8 *frm, u32 frm_len)
+{
+ struct sk_buff *skb;
+ u32 cfpkt_len, pad_len;
+
+ /* Verify that packet size with down-link header and mtu size */
+ if (frm_len > cfv->mru || frm_len <= cfv->rx_hr + cfv->rx_tr) {
+ netdev_err(cfv->ndev,
+ "Invalid frmlen:%u mtu:%u hr:%d tr:%d\n",
+ frm_len, cfv->mru, cfv->rx_hr,
+ cfv->rx_tr);
+ return NULL;
+ }
+
+ cfpkt_len = frm_len - (cfv->rx_hr + cfv->rx_tr);
+
+ pad_len = (unsigned long)(frm + cfv->rx_hr) & (IP_HDR_ALIGN - 1);
+
+ skb = netdev_alloc_skb(cfv->ndev, frm_len + pad_len);
+ if (!skb)
+ return NULL;
+
+ /* Reserve space for headers. */
+ skb_reserve(skb, cfv->rx_hr + pad_len);
+
+ memcpy(skb_put(skb, cfpkt_len), frm + cfv->rx_hr, cfpkt_len);
+ return skb;
+}
+
+/*
+ * This is invoked whenever the remote processor has sent down-link data
+ * on the Rx VQ avail ring and it's time to digest a message.
+ *
+ * CAIF virtio passes a complete CAIF frame including head/tail room
+ * in each linked descriptor. So iterate over all available buffers
+ * in available-ring and the associated linked descriptors.
+ */
+static void cfv_recv(struct virtqueue *vq_rx)
+{
+ struct cfv_info *cfv = vq_rx->vdev->priv;
+ struct vring_desc *desc;
+ struct sk_buff *skb;
+ int head = -1;
+ void *buf;
+ size_t len;
+ BUG_ON(!vq_rx->reversed);
+ for (desc = virtqueue_next_avail_desc(vq_rx, &head);
+ desc != NULL && !cfv_read_desc(desc, &buf, &len);
+ desc = virtqueue_next_desc(vq_rx, desc, &head)) {
+
+ skb = cfv_alloc_and_copy_skb(cfv, buf, len);
+ if (!skb)
+ goto err;
+ skb->protocol = htons(ETH_P_CAIF);
+ skb_reset_mac_header(skb);
+ skb->dev = cfv->ndev;
+ /* Push received packet up the stack. */
+ if (netif_receive_skb(skb))
+ goto err;
+ ++cfv->ndev->stats.rx_packets;
+ cfv->ndev->stats.rx_bytes += skb->len;
+ }
+ return;
+err:
+ ++cfv->ndev->stats.rx_dropped;
+ return;
+}
+
+static int cfv_netdev_open(struct net_device *netdev)
+{
+ netif_carrier_on(netdev);
+ return 0;
+}
+
+static int cfv_netdev_close(struct net_device *netdev)
+{
+ netif_carrier_off(netdev);
+ return 0;
+}
+
+static struct token_info *cfv_alloc_and_copy_to_dmabuf(struct cfv_info *cfv,
+ struct sk_buff *skb,
+ struct scatterlist *sg)
+{
+ struct caif_payload_info *info = (void *)&skb->cb;
+ struct token_info *buf_info = NULL;
+ u8 pad_len, hdr_ofs;
+
+ if (unlikely(cfv->tx_hr + skb->len + cfv->tx_tr > cfv->mtu)) {
+ netdev_warn(cfv->ndev, "Invalid packet len (%d > %d)\n",
+ cfv->tx_hr + skb->len + cfv->tx_tr, cfv->mtu);
+ goto err;
+ }
+
+ buf_info = kmalloc(sizeof(struct token_info), GFP_ATOMIC);
+ if (unlikely(!buf_info))
+ goto err;
+
+ /* Make the IP header aligned in tbe buffer */
+ hdr_ofs = cfv->tx_hr + info->hdr_len;
+ pad_len = hdr_ofs & (IP_HDR_ALIGN - 1);
+ buf_info->size = cfv->tx_hr + skb->len + cfv->tx_tr + pad_len;
+
+ if (WARN_ON_ONCE(!cfv->vdev->dev.parent))
+ goto err;
+
+ /* allocate coherent memory for the buffers */
+ buf_info->vaddr =
+ dma_alloc_coherent(cfv->vdev->dev.parent->parent,
+ buf_info->size, &buf_info->dma_handle,
+ GFP_ATOMIC);
+ if (unlikely(!buf_info->vaddr)) {
+ netdev_warn(cfv->ndev,
+ "Out of DMA memory (alloc %zu bytes)\n",
+ buf_info->size);
+ goto err;
+ }
+
+ /* copy skbuf contents to send buffer */
+ skb_copy_bits(skb, 0, buf_info->vaddr + cfv->tx_hr + pad_len, skb->len);
+ sg_init_one(sg, buf_info->vaddr + pad_len,
+ skb->len + cfv->tx_hr + cfv->rx_hr);
+ return buf_info;
+err:
+ kfree(buf_info);
+ return NULL;
+}
+
+/*
+ * This is invoked whenever the host processor application has sent
+ * up-link data. Send it in the TX VQ avail ring.
+ *
+ * CAIF Virtio sends does not use linked descriptors in the tx direction.
+ */
+static int cfv_netdev_tx(struct sk_buff *skb, struct net_device *netdev)
+{
+ struct cfv_info *cfv = netdev_priv(netdev);
+ struct token_info *buf_info;
+ struct scatterlist sg;
+ bool flow_off = false;
+
+ cfv_release_used_buf(cfv->vq_tx);
+
+ buf_info = cfv_alloc_and_copy_to_dmabuf(cfv, skb, &sg);
+ if (!buf_info)
+ goto err;
+
+ spin_lock_bh(&cfv->tx_lock);
+ /*
+ * Add buffer to avail ring.
+ * Note: in spite of a space check at beginning of the function,
+ * the add_buff call might fail in case of concurrent access on smp
+ * systems.
+ */
+ if (WARN_ON(virtqueue_add_buf(cfv->vq_tx, &sg, 0, 1,
+ buf_info, GFP_ATOMIC) < 0)) {
+ /* It should not happen */
+ flow_off = true;
+ goto err_unlock;
+ } else {
+ /* update netdev statistics */
+ cfv->queued_tx++;
+ cfv->ndev->stats.tx_packets++;
+ cfv->ndev->stats.tx_bytes += skb->len;
+ }
+
+ /*
+ * Flow-off check takes into account number of cpus to make sure
+ * virtqueue will not be overfilled in any possible smp conditions.
+ */
+ flow_off = cfv->queued_tx + num_present_cpus() >=
+ virtqueue_get_vring_size(cfv->vq_tx);
+
+ /* tell the remote processor it has a pending message to read */
+ virtqueue_kick(cfv->vq_tx);
+ if (flow_off) {
+ cfv->watermark_tx = cfv->queued_tx >> 1;
+ netif_tx_stop_all_queues(netdev);
+ }
+
+ spin_unlock_bh(&cfv->tx_lock);
+
+ dev_kfree_skb(skb);
+
+ /* Try to speculatively free used buffers */
+ if (flow_off)
+ cfv_release_used_buf(cfv->vq_tx);
+ return NETDEV_TX_OK;
+
+err_unlock:
+ spin_lock_bh(&cfv->tx_lock);
+err:
+ ++cfv->ndev->stats.tx_dropped;
+ dev_kfree_skb(skb);
+ return NETDEV_TX_OK;
+
+}
+
+static const struct net_device_ops cfv_netdev_ops = {
+ .ndo_open = cfv_netdev_open,
+ .ndo_stop = cfv_netdev_close,
+ .ndo_start_xmit = cfv_netdev_tx,
+};
+
+static void cfv_netdev_setup(struct net_device *netdev)
+{
+ netdev->netdev_ops = &cfv_netdev_ops;
+ netdev->type = ARPHRD_CAIF;
+ netdev->tx_queue_len = 100;
+ netdev->flags = IFF_POINTOPOINT | IFF_NOARP;
+ netdev->mtu = CFV_DEF_MTU_SIZE;
+ netdev->destructor = free_netdev;
+}
+
+#define GET_VIRTIO_CONFIG_OPS(_v, _var, _f) \
+ ((_v)->config->get(_v, offsetof(struct virtio_caif_transf_config, _f), \
+ &_var, \
+ FIELD_SIZEOF(struct virtio_caif_transf_config, _f)))
+
+static int __devinit cfv_probe(struct virtio_device *vdev)
+{
+ vq_callback_t *vq_cbs[] = { cfv_recv, cfv_release_used_buf };
+ const char *names[] = { "input", "output" };
+ const char *cfv_netdev_name = "cfvrt";
+ struct net_device *netdev;
+ struct virtqueue *vqs[2];
+ bool reversed[2];
+ struct cfv_info *cfv;
+ int err = 0;
+
+ netdev = alloc_netdev(sizeof(struct cfv_info), cfv_netdev_name,
+ cfv_netdev_setup);
+ if (!netdev)
+ return -ENOMEM;
+
+ cfv = netdev_priv(netdev);
+ cfv->vdev = vdev;
+ cfv->ndev = netdev;
+
+ spin_lock_init(&cfv->tx_lock);
+ reversed[0] = true;
+ reversed[1] = false;
+ /* Get two virtqueues, for tx/ul and rx/dl */
+ err = vdev->config->find_vqs(vdev, 2, vqs, vq_cbs, names, reversed);
+ if (err)
+ goto free_cfv;
+
+ cfv->vq_rx = vqs[0];
+ cfv->vq_tx = vqs[1];
+
+ BUG_ON(!cfv->vq_rx->reversed);
+ BUG_ON(cfv->vq_tx->reversed);
+
+ if (vdev->config->get) {
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_hr, headroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_hr, headroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->tx_tr, tailroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->rx_tr, tailroom);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->mtu, mtu);
+ GET_VIRTIO_CONFIG_OPS(vdev, cfv->mru, mtu);
+ } else {
+ cfv->tx_hr = CFV_DEF_HEADROOM;
+ cfv->rx_hr = CFV_DEF_HEADROOM;
+ cfv->tx_tr = CFV_DEF_TAILROOM;
+ cfv->rx_tr = CFV_DEF_TAILROOM;
+ cfv->mtu = CFV_DEF_MTU_SIZE;
+ cfv->mru = CFV_DEF_MTU_SIZE;
+
+ }
+
+ vdev->priv = cfv;
+
+ netif_carrier_off(netdev);
+
+ /* register Netdev */
+ err = register_netdev(netdev);
+ if (err) {
+ dev_err(&vdev->dev, "Unable to register netdev (%d)\n", err);
+ goto vqs_del;
+ }
+
+ /* tell the remote processor it can start sending messages */
+ virtqueue_kick(cfv->vq_rx);
+ return 0;
+
+vqs_del:
+ vdev->config->del_vqs(cfv->vdev);
+free_cfv:
+ free_netdev(netdev);
+ return err;
+}
+
+static void __devexit cfv_remove(struct virtio_device *vdev)
+{
+ struct cfv_info *cfv = vdev->priv;
+ vdev->config->reset(vdev);
+ vdev->config->del_vqs(cfv->vdev);
+ unregister_netdev(cfv->ndev);
+}
+
+static struct virtio_device_id id_table[] = {
+ { VIRTIO_ID_CAIF, VIRTIO_DEV_ANY_ID },
+ { 0 },
+};
+
+static unsigned int features[] = {
+};
+
+static struct virtio_driver caif_virtio_driver = {
+ .feature_table = features,
+ .feature_table_size = ARRAY_SIZE(features),
+ .driver.name = KBUILD_MODNAME,
+ .driver.owner = THIS_MODULE,
+ .id_table = id_table,
+ .probe = cfv_probe,
+ .remove = cfv_remove,
+};
+
+module_driver(caif_virtio_driver, register_virtio_driver,
+ unregister_virtio_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
diff --git a/include/linux/virtio_caif.h b/include/linux/virtio_caif.h
new file mode 100644
index 0000000..5d2d312
--- /dev/null
+++ b/include/linux/virtio_caif.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) ST-Ericsson AB 2012
+ * Author: Sjur Brændeland <sjur.brandeland@stericsson.com>
+ *
+ * This header is BSD licensed so
+ * anyone can use the definitions to implement compatible remote processors
+ */
+
+#ifndef VIRTIO_CAIF_H
+#define VIRTIO_CAIF_H
+
+#include <linux/types.h>
+struct virtio_caif_transf_config {
+ u16 headroom;
+ u16 tailroom;
+ u32 mtu;
+ u8 reserved[4];
+};
+
+struct virtio_caif_config {
+ struct virtio_caif_transf_config uplink, downlink;
+ u8 reserved[8];
+};
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 270fb22..c2dbedd 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -37,5 +37,6 @@
#define VIRTIO_ID_RPMSG 7 /* virtio remote processor messaging */
#define VIRTIO_ID_SCSI 8 /* virtio scsi */
#define VIRTIO_ID_9P 9 /* 9p virtio console */
+#define VIRTIO_ID_CAIF 12 /* Virtio caif */
#endif /* _LINUX_VIRTIO_IDS_H */
--
1.7.5.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-05 14:37 ` [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory Sjur Brændeland
@ 2012-12-06 9:52 ` Michael S. Tsirkin
2012-12-06 11:03 ` Sjur BRENDELAND
0 siblings, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2012-12-06 9:52 UTC (permalink / raw)
To: Sjur Brændeland; +Cc: Linus Walleij, virtualization, Sjur Brændeland
On Wed, Dec 05, 2012 at 03:37:02PM +0100, Sjur Brændeland wrote:
> Isolate the access to user-memory in separate inline
> functions. This open up for reuse from host-side virtioqueue
> implementation accessing virtio-ring in kernel space.
>
> Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
> ---
> drivers/virtio/virtio_ring_host.c | 81 ++++++++++++++++++++++++++-----------
> 1 files changed, 57 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring_host.c b/drivers/virtio/virtio_ring_host.c
> index 192b838..0750099 100644
> --- a/drivers/virtio/virtio_ring_host.c
> +++ b/drivers/virtio/virtio_ring_host.c
> @@ -18,44 +18,45 @@
> #include <linux/virtio_ring.h>
> #include <linux/module.h>
> #include <linux/uaccess.h>
> +#include <linux/kconfig.h>
>
> MODULE_LICENSE("GPL");
>
> -struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
> - unsigned int head, int len)
> +
> +static inline struct vring_used_elem *_vring_add_used(struct vring_host *vh,
> + u32 head, u32 len,
> + bool (*cpy)(void *dst,
> + void *src,
> + size_t s),
> + void (*wbarrier)(void))
> {
> struct vring_used_elem *used;
> + u16 last_used;
>
> /* The virtqueue contains a ring of used buffers. Get a pointer to the
> * next entry in that used ring. */
> - used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
> - if (__put_user(head, &used->id)) {
> - pr_debug("Failed to write used id");
> + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num - 1)];
> + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> + !cpy(&used->len, &len, sizeof(used->len)))
> return NULL;
> - }
> - if (__put_user(len, &used->len)) {
> - pr_debug("Failed to write used len");
> + wbarrier();
> + last_used = vh->last_used_idx + 1;
> + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used->idx)))
> return NULL;
I think this is broken: we need a 16 bit access, this is
doing a memcpy which is byte by byte.
> - }
> - /* Make sure buffer is written before we update index. */
> - smp_wmb();
> - if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
> - pr_debug("Failed to increment used idx");
> - return NULL;
> - }
> - vh->last_used_idx++;
> + vh->last_used_idx = last_used;
> return used;
> }
> -EXPORT_SYMBOL(vring_add_used_user);
>
> -int vring_avail_desc_user(struct vring_host *vh)
> +static inline int _vring_avail_desc(struct vring_host *vh,
> + bool (*get)(u16 *dst, u16 *src),
> + void (*read_barrier)(void))
> {
> - int head;
> + u16 head;
> u16 last_avail_idx;
>
> /* Check it isn't doing very strange things with descriptor numbers. */
> last_avail_idx = vh->last_avail_idx;
> - if (unlikely(__get_user(vh->avail_idx, &vh->vr.avail->idx))) {
> + if (unlikely(!get(&vh->avail_idx, &vh->vr.avail->idx))) {
> pr_debug("Failed to access avail idx at %p\n",
> &vh->vr.avail->idx);
> return -EFAULT;
> @@ -69,13 +70,12 @@ int vring_avail_desc_user(struct vring_host *vh)
> return vh->vr.num;
>
> /* Only get avail ring entries after they have been exposed by guest. */
> - smp_rmb();
> + read_barrier();
>
> /* Grab the next descriptor number they're advertising, and increment
> * the index we've seen. */
> - if (unlikely(__get_user(head,
> - &vh->vr.avail->ring[last_avail_idx %
> - vh->vr.num]))) {
> + if (unlikely(!get(&head, &vh->vr.avail->ring[last_avail_idx &
> + (vh->vr.num - 1)]))) {
> pr_debug("Failed to read head: idx %d address %p\n",
> last_avail_idx,
> &vh->vr.avail->ring[last_avail_idx %
> @@ -92,6 +92,39 @@ int vring_avail_desc_user(struct vring_host *vh)
>
> return head;
> }
> +
> +static inline void smp_write_barrier(void)
> +{
> + smp_wmb();
> +}
> +
> +static inline void smp_read_barrier(void)
> +{
> + smp_rmb();
> +}
> +
> +static inline bool userspace_cpy_to(void *dst, void *src, size_t s)
> +{
> + return __copy_to_user(dst, src, s) == 0;
> +}
> +
> +static inline bool userspace_get(u16 *dst, u16 *src)
> +{
> + return __get_user(*dst, src);
> +}
> +
> +struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
> + unsigned int head, int len)
> +{
> + return _vring_add_used(vh, head, len, userspace_cpy_to,
> + smp_write_barrier);
> +}
> +EXPORT_SYMBOL(vring_add_used_user);
> +
> +int vring_avail_desc_user(struct vring_host *vh)
> +{
> + return _vring_avail_desc(vh, userspace_get, smp_read_barrier);
> +}
> EXPORT_SYMBOL(vring_avail_desc_user);
>
> /* Each buffer in the virtqueues is actually a chain of descriptors. This
> --
> 1.7.5.4
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
` (11 preceding siblings ...)
2012-12-05 14:37 ` [RFCv2 12/12] caif_virtio: Introduce caif over virtio Sjur Brændeland
@ 2012-12-06 10:27 ` Michael S. Tsirkin
2012-12-21 6:11 ` Rusty Russell
12 siblings, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2012-12-06 10:27 UTC (permalink / raw)
To: Sjur Brændeland; +Cc: Linus Walleij, virtualization, Sjur Brændeland
On Wed, Dec 05, 2012 at 03:36:58PM +0100, Sjur Brændeland wrote:
> Feedback on this patch-set is appreciated, particularly on structure
> and code-reuse between vhost.c and the host-side virtio-queue.
> I'd also like some suggestions on how to handle the build configuration
> better - currently there are some unnecessary build dependencies.
Rusty seems to disagree but one of the concerns people have about vhost
is security; so I value getting as much static checking as we can. This
discards __user annotations so this doesn't work for me.
I also have concerns about how much overhead an indirect function
call on each 2 byte access would have.
After thinking about this I think there's no way without using a
preprocessor. But using a preprocessor it's not very bad: we'll have to
make everything inline in header but that's not terrible as the
functions are pretty short: it's less about code size and more about
avoiding code duplication.
Basically like this:
/* Legal values for VIRTIO_HOST_MODE */
#define VIRTIO_HOST_KERNEL 0
#define VIRTIO_HOST_USER 1
#if defined(VIRTIO_HOST_H) && VIRTIO_HOST_H != VIRTIO_HOST_MODE
#error "Header included twice with VIRTIO_HOST_MODE redefined"
#endif
#ifndef VIRTIO_HOST_H
/* Besides serving as a double inclusion guard, this
* verifies that multiple inclusions define VIRTIO_HOST_MODE
* consistently.
*/
#define VIRTIO_HOST_H VIRTIO_HOST_MODE
#ifndef VIRTIO_HOST_MODE
#error "Must define VIRTIO_HOST_MODE to VIRTIO_HOST_KERNEL or VIRTIO_HOST_USER"
#endif
#if VIRTIO_HOST_MODE == VIRTIO_HOST_KERNEL
#define __virtio_host_user
#define __virtio_host_put_user(x, ptr) ({*(ptr) = (__typeof__(*(ptr)))(x); 0;})
#else
#define __virtio_host_user __user
#define __virtio_host_put_user(x, ptr) __put_user(x, ptr)
#endif
static inline
int vhost_add_used(struct virtio_host_vq *vq, unsigned int head, int len)
{
struct vring_used_elem __virtio_host_user *used;
/* The virtqueue contains a ring of used buffers. Get a pointer to the
* next entry in that used ring. */
used = &vq->used->ring[vq->last_used_idx % vq->num];
if (__virtio_host_put_user(head, &used->id)) {
...
#endif
Users will have to
#define VIRTIO_HOST_MODE VIRTIO_HOST_USER
#include "linux/virtio_host.h"
or
#define VIRTIO_HOST_MODE VIRTIO_HOST_KERNEL
#include "linux/virtio_host.h"
--
MST
^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-06 9:52 ` Michael S. Tsirkin
@ 2012-12-06 11:03 ` Sjur BRENDELAND
2012-12-06 11:15 ` Michael S. Tsirkin
0 siblings, 1 reply; 51+ messages in thread
From: Sjur BRENDELAND @ 2012-12-06 11:03 UTC (permalink / raw)
To: Michael S. Tsirkin, Sjur Brændeland
Cc: Linus Walleij, virtualization@lists.linux-foundation.org
Hi Michael,
> > -struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
> > - unsigned int head, int len)
> > +
> > +static inline struct vring_used_elem *_vring_add_used(struct vring_host
> *vh,
> > + u32 head, u32 len,
> > + bool (*cpy)(void *dst,
> > + void *src,
> > + size_t s),
> > + void (*wbarrier)(void))
> > {
> > struct vring_used_elem *used;
> > + u16 last_used;
> >
> > /* The virtqueue contains a ring of used buffers. Get a pointer to the
> > * next entry in that used ring. */
> > - used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
> > - if (__put_user(head, &used->id)) {
> > - pr_debug("Failed to write used id");
> > + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num - 1)];
> > + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> > + !cpy(&used->len, &len, sizeof(used->len)))
> > return NULL;
> > - }
> > - if (__put_user(len, &used->len)) {
> > - pr_debug("Failed to write used len");
> > + wbarrier();
> > + last_used = vh->last_used_idx + 1;
> > + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used->idx)))
> > return NULL;
>
> I think this is broken: we need a 16 bit access, this is
> doing a memcpy which is byte by byte.
I have played around with gcc and -O2 option, and it seems to me
that GCC is smart enough to optimize the use of sizeof and memcpy
into MOV operations. But my assembly knowledge is very rusty,
so a second opinion on this would be good.
> > - }
> > - /* Make sure buffer is written before we update index. */
> > - smp_wmb();
> > - if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
> > - pr_debug("Failed to increment used idx");
> > - return NULL;
> > - }
> > - vh->last_used_idx++;
> > + vh->last_used_idx = last_used;
> > return used;
> > }
> > -EXPORT_SYMBOL(vring_add_used_user);
Thanks,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-06 11:03 ` Sjur BRENDELAND
@ 2012-12-06 11:15 ` Michael S. Tsirkin
2012-12-07 11:05 ` Sjur BRENDELAND
0 siblings, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2012-12-06 11:15 UTC (permalink / raw)
To: Sjur BRENDELAND
Cc: Sjur Brændeland, Linus Walleij,
virtualization@lists.linux-foundation.org
On Thu, Dec 06, 2012 at 12:03:43PM +0100, Sjur BRENDELAND wrote:
> Hi Michael,
>
> > > -struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
> > > - unsigned int head, int len)
> > > +
> > > +static inline struct vring_used_elem *_vring_add_used(struct vring_host
> > *vh,
> > > + u32 head, u32 len,
> > > + bool (*cpy)(void *dst,
> > > + void *src,
> > > + size_t s),
> > > + void (*wbarrier)(void))
> > > {
> > > struct vring_used_elem *used;
> > > + u16 last_used;
> > >
> > > /* The virtqueue contains a ring of used buffers. Get a pointer to the
> > > * next entry in that used ring. */
> > > - used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
> > > - if (__put_user(head, &used->id)) {
> > > - pr_debug("Failed to write used id");
> > > + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num - 1)];
> > > + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> > > + !cpy(&used->len, &len, sizeof(used->len)))
> > > return NULL;
> > > - }
> > > - if (__put_user(len, &used->len)) {
> > > - pr_debug("Failed to write used len");
> > > + wbarrier();
> > > + last_used = vh->last_used_idx + 1;
> > > + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used->idx)))
> > > return NULL;
> >
> > I think this is broken: we need a 16 bit access, this is
> > doing a memcpy which is byte by byte.
>
> I have played around with gcc and -O2 option, and it seems to me
> that GCC is smart enough to optimize the use of sizeof and memcpy
> into MOV operations. But my assembly knowledge is very rusty,
> so a second opinion on this would be good.
Yes but I don't think we should rely on this: this API is not guaranteed
to do the right thing and no one will bother telling us if it's
rewritten or something.
> > > - }
> > > - /* Make sure buffer is written before we update index. */
> > > - smp_wmb();
> > > - if (__put_user(vh->last_used_idx + 1, &vh->vr.used->idx)) {
> > > - pr_debug("Failed to increment used idx");
> > > - return NULL;
> > > - }
> > > - vh->last_used_idx++;
> > > + vh->last_used_idx = last_used;
> > > return used;
> > > }
> > > -EXPORT_SYMBOL(vring_add_used_user);
>
> Thanks,
> Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-06 11:15 ` Michael S. Tsirkin
@ 2012-12-07 11:05 ` Sjur BRENDELAND
2012-12-07 12:40 ` Michael S. Tsirkin
0 siblings, 1 reply; 51+ messages in thread
From: Sjur BRENDELAND @ 2012-12-07 11:05 UTC (permalink / raw)
To: Michael S. Tsirkin, Rusty Russell
Cc: Sjur Brændeland, Linus Walleij,
virtualization@lists.linux-foundation.org
Hi Michael,
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> Sent: Thursday, December 06, 2012 12:16 PM
> On Thu, Dec 06, 2012 at 12:03:43PM +0100, Sjur BRENDELAND wrote:
> > Hi Michael,
> >
> > > > -struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
> > > > - unsigned int head, int len)
> > > > +
> > > > +static inline struct vring_used_elem *_vring_add_used(struct
> vring_host
> > > *vh,
> > > > + u32 head, u32 len,
> > > > + bool (*cpy)(void *dst,
> > > > + void *src,
> > > > + size_t s),
> > > > + void (*wbarrier)(void))
> > > > {
> > > > struct vring_used_elem *used;
> > > > + u16 last_used;
> > > >
> > > > /* The virtqueue contains a ring of used buffers. Get a pointer to the
> > > > * next entry in that used ring. */
> > > > - used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
> > > > - if (__put_user(head, &used->id)) {
> > > > - pr_debug("Failed to write used id");
> > > > + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num - 1)];
> > > > + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> > > > + !cpy(&used->len, &len, sizeof(used->len)))
> > > > return NULL;
> > > > - }
> > > > - if (__put_user(len, &used->len)) {
> > > > - pr_debug("Failed to write used len");
> > > > + wbarrier();
> > > > + last_used = vh->last_used_idx + 1;
> > > > + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used->idx)))
> > > > return NULL;
> > >
> > > I think this is broken: we need a 16 bit access, this is
> > > doing a memcpy which is byte by byte.
> >
> > I have played around with gcc and -O2 option, and it seems to me
> > that GCC is smart enough to optimize the use of sizeof and memcpy
> > into MOV operations. But my assembly knowledge is very rusty,
> > so a second opinion on this would be good.
>
> Yes but I don't think we should rely on this: this API is not guaranteed
> to do the right thing and no one will bother telling us if it's
> rewritten or something.
What is your concern here Michael? Are you uncomfortable with
relying on GCC optimizations of memory access and the use of inline
function pointers?
Or are you afraid that virtio_ring changes implementation without
your knowledge, so that vhost performance will suffer?
If the latter is your concern, perhaps we could implement the memory access
functions in vhost.c and the shared ring functions in a header file.
In this way we could still use inline function pointers as Rusty suggested, but you
gain more control of the memory access from vhost.
If you are concerned about GCC optimization, I can do a re-spin with your
proposal using #defines...
Regards,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-07 11:05 ` Sjur BRENDELAND
@ 2012-12-07 12:40 ` Michael S. Tsirkin
2012-12-07 13:02 ` Sjur BRENDELAND
0 siblings, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2012-12-07 12:40 UTC (permalink / raw)
To: Sjur BRENDELAND
Cc: Sjur Brændeland, Linus Walleij,
virtualization@lists.linux-foundation.org
On Fri, Dec 07, 2012 at 12:05:11PM +0100, Sjur BRENDELAND wrote:
> Hi Michael,
>
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, December 06, 2012 12:16 PM
> > On Thu, Dec 06, 2012 at 12:03:43PM +0100, Sjur BRENDELAND wrote:
> > > Hi Michael,
> > >
> > > > > -struct vring_used_elem *vring_add_used_user(struct vring_host *vh,
> > > > > - unsigned int head, int len)
> > > > > +
> > > > > +static inline struct vring_used_elem *_vring_add_used(struct
> > vring_host
> > > > *vh,
> > > > > + u32 head, u32 len,
> > > > > + bool (*cpy)(void *dst,
> > > > > + void *src,
> > > > > + size_t s),
> > > > > + void (*wbarrier)(void))
> > > > > {
> > > > > struct vring_used_elem *used;
> > > > > + u16 last_used;
> > > > >
> > > > > /* The virtqueue contains a ring of used buffers. Get a pointer to the
> > > > > * next entry in that used ring. */
> > > > > - used = &vh->vr.used->ring[vh->last_used_idx % vh->vr.num];
> > > > > - if (__put_user(head, &used->id)) {
> > > > > - pr_debug("Failed to write used id");
> > > > > + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num - 1)];
> > > > > + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> > > > > + !cpy(&used->len, &len, sizeof(used->len)))
> > > > > return NULL;
> > > > > - }
> > > > > - if (__put_user(len, &used->len)) {
> > > > > - pr_debug("Failed to write used len");
> > > > > + wbarrier();
> > > > > + last_used = vh->last_used_idx + 1;
> > > > > + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used->idx)))
> > > > > return NULL;
> > > >
> > > > I think this is broken: we need a 16 bit access, this is
> > > > doing a memcpy which is byte by byte.
> > >
> > > I have played around with gcc and -O2 option, and it seems to me
> > > that GCC is smart enough to optimize the use of sizeof and memcpy
> > > into MOV operations. But my assembly knowledge is very rusty,
> > > so a second opinion on this would be good.
> >
> > Yes but I don't think we should rely on this: this API is not guaranteed
> > to do the right thing and no one will bother telling us if it's
> > rewritten or something.
>
> What is your concern here Michael? Are you uncomfortable with
> relying on GCC optimizations of memory access and the use of inline
> function pointers?
> Or are you afraid that virtio_ring changes implementation without
> your knowledge, so that vhost performance will suffer?
>
> If the latter is your concern, perhaps we could implement the memory access
> functions in vhost.c and the shared ring functions in a header file.
> In this way we could still use inline function pointers as Rusty suggested, but you
> gain more control of the memory access from vhost.
>
> If you are concerned about GCC optimization, I can do a re-spin with your
> proposal using #defines...
>
> Regards,
> Sjur
GCC for the in-kernel version and kernel changes for the userspace
version.
--
MST
^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-07 12:40 ` Michael S. Tsirkin
@ 2012-12-07 13:02 ` Sjur BRENDELAND
2012-12-07 14:05 ` Michael S. Tsirkin
0 siblings, 1 reply; 51+ messages in thread
From: Sjur BRENDELAND @ 2012-12-07 13:02 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Sjur Brændeland, Linus Walleij,
virtualization@lists.linux-foundation.org
> From: Michael S. Tsirkin [mailto:mst@redhat.com]
> On Fri, Dec 07, 2012 at 12:05:11PM +0100, Sjur BRENDELAND wrote:
> > Hi Michael,
> > > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > > Sent: Thursday, December 06, 2012 12:16 PM
> > > On Thu, Dec 06, 2012 at 12:03:43PM +0100, Sjur BRENDELAND wrote:
> > > > Hi Michael,
> > > >
> > > > > > -struct vring_used_elem *vring_add_used_user(struct vring_host
> *vh,
> > > > > > - unsigned int head, int len)
> > > > > > +
> > > > > > +static inline struct vring_used_elem *_vring_add_used(struct
> > > vring_host
> > > > > *vh,
> > > > > > + u32 head, u32 len,
> > > > > > + bool (*cpy)(void
> *dst,
> > > > > > + void
> *src,
> > > > > > + size_t
> s),
> > > > > > + void
> (*wbarrier)(void))
> > > > > > {
> > > > > > struct vring_used_elem *used;
> > > > > > + u16 last_used;
> > > > > >
> > > > > > /* The virtqueue contains a ring of used buffers. Get a
> pointer to the
> > > > > > * next entry in that used ring. */
> > > > > > - used = &vh->vr.used->ring[vh->last_used_idx % vh-
> >vr.num];
> > > > > > - if (__put_user(head, &used->id)) {
> > > > > > - pr_debug("Failed to write used id");
> > > > > > + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num
> - 1)];
> > > > > > + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> > > > > > + !cpy(&used->len, &len, sizeof(used->len)))
> > > > > > return NULL;
> > > > > > - }
> > > > > > - if (__put_user(len, &used->len)) {
> > > > > > - pr_debug("Failed to write used len");
> > > > > > + wbarrier();
> > > > > > + last_used = vh->last_used_idx + 1;
> > > > > > + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used-
> >idx)))
> > > > > > return NULL;
> > > > >
> > > > > I think this is broken: we need a 16 bit access, this is
> > > > > doing a memcpy which is byte by byte.
> > > >
> > > > I have played around with gcc and -O2 option, and it seems to me
> > > > that GCC is smart enough to optimize the use of sizeof and memcpy
> > > > into MOV operations. But my assembly knowledge is very rusty,
> > > > so a second opinion on this would be good.
> > >
> > > Yes but I don't think we should rely on this: this API is not guaranteed
> > > to do the right thing and no one will bother telling us if it's
> > > rewritten or something.
> >
> > What is your concern here Michael? Are you uncomfortable with
> > relying on GCC optimizations of memory access and the use of inline
> > function pointers?
> > Or are you afraid that virtio_ring changes implementation without
> > your knowledge, so that vhost performance will suffer?
> >
> > If the latter is your concern, perhaps we could implement the memory
> access
> > functions in vhost.c and the shared ring functions in a header file.
> > In this way we could still use inline function pointers as Rusty suggested,
> but you
> > gain more control of the memory access from vhost.
> >
> > If you are concerned about GCC optimization, I can do a re-spin with your
> > proposal using #defines...
> >
> > Regards,
> > Sjur
>
> GCC for the in-kernel version and kernel changes for the userspace
> version.
Rusty, are you happy with using #defines instead of inline function pointers?
See http://lists.linuxfoundation.org/pipermail/virtualization/2012-December/022258.html
I guess this also implies that I have to add __user annotations to the vring definitions in
order satisfy sparse.
If you're OK Michael's proposal I'll do a respin of the patches and move the
inline functions used by vhost.c into virtio_ring.h.
Thanks,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory
2012-12-07 13:02 ` Sjur BRENDELAND
@ 2012-12-07 14:05 ` Michael S. Tsirkin
0 siblings, 0 replies; 51+ messages in thread
From: Michael S. Tsirkin @ 2012-12-07 14:05 UTC (permalink / raw)
To: Sjur BRENDELAND
Cc: Sjur Brændeland, Linus Walleij,
virtualization@lists.linux-foundation.org
On Fri, Dec 07, 2012 at 02:02:12PM +0100, Sjur BRENDELAND wrote:
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > On Fri, Dec 07, 2012 at 12:05:11PM +0100, Sjur BRENDELAND wrote:
> > > Hi Michael,
> > > > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > > > Sent: Thursday, December 06, 2012 12:16 PM
> > > > On Thu, Dec 06, 2012 at 12:03:43PM +0100, Sjur BRENDELAND wrote:
> > > > > Hi Michael,
> > > > >
> > > > > > > -struct vring_used_elem *vring_add_used_user(struct vring_host
> > *vh,
> > > > > > > - unsigned int head, int len)
> > > > > > > +
> > > > > > > +static inline struct vring_used_elem *_vring_add_used(struct
> > > > vring_host
> > > > > > *vh,
> > > > > > > + u32 head, u32 len,
> > > > > > > + bool (*cpy)(void
> > *dst,
> > > > > > > + void
> > *src,
> > > > > > > + size_t
> > s),
> > > > > > > + void
> > (*wbarrier)(void))
> > > > > > > {
> > > > > > > struct vring_used_elem *used;
> > > > > > > + u16 last_used;
> > > > > > >
> > > > > > > /* The virtqueue contains a ring of used buffers. Get a
> > pointer to the
> > > > > > > * next entry in that used ring. */
> > > > > > > - used = &vh->vr.used->ring[vh->last_used_idx % vh-
> > >vr.num];
> > > > > > > - if (__put_user(head, &used->id)) {
> > > > > > > - pr_debug("Failed to write used id");
> > > > > > > + used = &vh->vr.used->ring[vh->last_used_idx & (vh->vr.num
> > - 1)];
> > > > > > > + if (!cpy(&used->id, &head, sizeof(used->id)) ||
> > > > > > > + !cpy(&used->len, &len, sizeof(used->len)))
> > > > > > > return NULL;
> > > > > > > - }
> > > > > > > - if (__put_user(len, &used->len)) {
> > > > > > > - pr_debug("Failed to write used len");
> > > > > > > + wbarrier();
> > > > > > > + last_used = vh->last_used_idx + 1;
> > > > > > > + if (!cpy(&vh->vr.used->idx, &last_used, sizeof(vh->vr.used-
> > >idx)))
> > > > > > > return NULL;
> > > > > >
> > > > > > I think this is broken: we need a 16 bit access, this is
> > > > > > doing a memcpy which is byte by byte.
> > > > >
> > > > > I have played around with gcc and -O2 option, and it seems to me
> > > > > that GCC is smart enough to optimize the use of sizeof and memcpy
> > > > > into MOV operations. But my assembly knowledge is very rusty,
> > > > > so a second opinion on this would be good.
> > > >
> > > > Yes but I don't think we should rely on this: this API is not guaranteed
> > > > to do the right thing and no one will bother telling us if it's
> > > > rewritten or something.
> > >
> > > What is your concern here Michael? Are you uncomfortable with
> > > relying on GCC optimizations of memory access and the use of inline
> > > function pointers?
> > > Or are you afraid that virtio_ring changes implementation without
> > > your knowledge, so that vhost performance will suffer?
> > >
> > > If the latter is your concern, perhaps we could implement the memory
> > access
> > > functions in vhost.c and the shared ring functions in a header file.
> > > In this way we could still use inline function pointers as Rusty suggested,
> > but you
> > > gain more control of the memory access from vhost.
> > >
> > > If you are concerned about GCC optimization, I can do a re-spin with your
> > > proposal using #defines...
> > >
> > > Regards,
> > > Sjur
> >
> > GCC for the in-kernel version and kernel changes for the userspace
> > version.
>
> Rusty, are you happy with using #defines instead of inline function pointers?
> See http://lists.linuxfoundation.org/pipermail/virtualization/2012-December/022258.html
> I guess this also implies that I have to add __user annotations to the vring definitions in
> order satisfy sparse.
>
> If you're OK Michael's proposal I'll do a respin of the patches and move the
> inline functions used by vhost.c into virtio_ring.h.
>
> Thanks,
> Sjur
>
I think this needs a separate header.
We use available bytes so virtio_ring_user.h ?
--
MST
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2012-12-06 10:27 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Michael S. Tsirkin
@ 2012-12-21 6:11 ` Rusty Russell
2013-01-08 8:04 ` Sjur Brændeland
0 siblings, 1 reply; 51+ messages in thread
From: Rusty Russell @ 2012-12-21 6:11 UTC (permalink / raw)
To: Michael S. Tsirkin, Sjur Brændeland
Cc: Linus Walleij, Sjur Brændeland, virtualization
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Dec 05, 2012 at 03:36:58PM +0100, Sjur Brændeland wrote:
>> Feedback on this patch-set is appreciated, particularly on structure
>> and code-reuse between vhost.c and the host-side virtio-queue.
>> I'd also like some suggestions on how to handle the build configuration
>> better - currently there are some unnecessary build dependencies.
>
> Rusty seems to disagree but one of the concerns people have about vhost
> is security; so I value getting as much static checking as we can. This
> discards __user annotations so this doesn't work for me.
Sometimes, when we generalize code, we lose some type safety. Callbacks
take void *, for example. And it happens *all the time* with const. We
don't create a set of parallel const-safe routines.
Extracting common code where it can be shared provides better, not worse
security, because more people will read it. I've never audited the
vhost code, for example.
We already have a 'struct vring', we should just use that.
I meant to do more work on this, but I've run out of time :( I was
thinking of a drivers/virtio/virtio_ring.c, and in that we'd put the
wrappers for vhost_net/blk, and for CAIF, etc. We could at least have a
variant which did the __user etc thing, even if you have to pass in
getdesc().
getu16: get_user or assignment.
getdesc: copy (and check, translate) the descriptor.
getmem/putmem: memcpy or copy_to/from_user.
(Completely untested, of course...)
Thoughts?
Rusty.
/*
* Helpers for the host side of a virtio ring.
*
* Since these may be in userspace, we use (inline) accessors.
*/
#include <uapi/linux/virtio_ring.h>
/* Returns vring->num if empty, -ve on error. */
static inline int __vringh_get_head(const struct vring *vring,
int (*getu16)(u16 *val, u16 *p))
{
u16 last_avail_idx, avail_idx, i, head;
int err;
err = getu16(&avail_idx, &vring->avail->idx);
if (err) {
vringh_bad("Failed to access avail idx at %p", &vq->avail->idx);
return err;
}
err = getu16(&last_avail_idx, &vring_avail_event(vring));
if (err) {
vringh_bad("Failed to access last avail idx at %p",
&vring_avail_event(vring));
return err;
}
if (last_avail_idx == avail_idx)
return vring->num;
/* Only get avail ring entries after they have been exposed by guest. */
smp_rmb();
i = last_avail_idx & (vring->num - 1);
err = getu16(&head, &vq->avail->ring[i]);
if (err) {
vringh_bad("Failed to read head: idx %d address %p",
last_avail_idx, &vq->avail->ring[i]);
return err;
}
if (head >= vring->num) {
vringh_bad("Guest says index %u > %u is available",
head, vring->num);
return -EINVAL;
}
return head;
}
struct vringh_access {
/* Start address. */
struct vring_desc *start;
/* Maximum number of entries. */
u32 max;
/* Cached descriptor. */
struct vring_desc desc;
};
/*
* Initialize the vringh_access structure for this head.
*
* For direct buffers, the range is simply the desc[] array in the vring.
*
* For indirect buffers, the range is the indirect entry; check() is called
* to vlidate this range.
*
* -error otherwise.
*/
static inline int __vringh_get_access(const struct vring *vring, u16 head,
int (*getdesc)(struct vring_desc *dst,
const struct vring_desc *src,
void *arg),
void *arg,
struct vringh_access *acc)
{
int err;
err = getdesc(&acc->desc, &vring->desc[head], arg);
if (unlikely(err))
return err;
if (acc->desc.flags & VRING_DESC_F_INDIRECT) {
/* We don't support chained indirects. */
if (acc->desc.flags & VRING_DESC_F_NEXT)
return -EINVAL;
if (unlikely(acc->desc.len % sizeof desc))
return -EINVAL;
acc->start = (void *)(long)acc->desc.addr;
acc->max = acc->desc.len / sizeof(desc);
if (acc->max > vring->num)
return -EINVAL;
/* Force us to read first desc next time. */
acc->desc.len = 0;
acc->desc.next = 0;
acc->desc.flags = VRING_DESC_F_NEXT;
} else {
acc->addr = vring->desc;
acc->max = vring->num;
acc->head = head;
}
return 0;
}
/*
* Copy some bytes from the vring descriptor. Returns num copied.
*
* You are expected to exhaust the readable descriptors (ie. stuff output
* from the other side) before looking for writable descriptors.
*/
static inline int __vringh_pull(struct vringh_access *acc, void *dst, u32 len,
int (*getdesc)(struct vring_desc *dst,
const struct vring_desc *src,
void *arg),
int (*getmem)(void *dst, const void *src, size_t,
void *arg),
void *arg)
{
int err, done = 0;
while (len) {
void *src;
u32 rlen;
/* Exhausted this descriptor? Read next. */
if (acc->desc.len == 0) {
if (!(acc->desc.flags & VRING_DESC_F_NEXT))
return done;
if (acc->desc.next >= acc->max) {
vringh_bad("Guest chained index %u > %u",
desc.next, acc->max);
return -EINVAL;
}
acc->head = acc->desc.next;
err = getdesc(&acc->desc, acc->start + acc->head, arg);
if (unlikely(err))
return err;
/* No more readable descriptors? */
if (unlikely(acc->desc.flags & VRING_DESC_F_WRITE))
return done;
}
if (len < acc->desc.len)
rlen = len;
else
rlen = acc->desc.len;
src = (void *)(long)acc->desc.addr;
if (unlikely(getmem(dst, src, rlen, arg)))
return -EFAULT;
acc->desc.len -= rlen;
acc->desc.addr += rlen;
len -= rlen;
done += rlen;
}
return done;
}
/* Fill in iovec containing remaining readable descriptors, return num
* (can be no more than vring->num). */
static inline int __vringh_riov(struct vringh_access *acc,
struct iovec riov[],
int (*getdesc)(struct vring_desc *dst,
const struct vring_desc *src,
void *arg),
void *arg)
{
int err, count = 0;
while (!(acc->desc.flags & VRING_DESC_F_WRITE)) {
if (acc->desc.len != 0) {
if (count == acc->max)
return -ELOOP;
riov->iov_base = (void * __user)(long)acc->desc.addr;
riov->iov_len = acc->desc.len;
riov++;
count++;
}
if (!(acc->desc.flags & VRING_DESC_F_NEXT))
break;
if (acc->desc.next >= acc->max)
return -EINVAL;
acc->head = acc->desc.next;
err = getdesc(&acc->desc, acc->start + acc->head, arg);
if (unlikely(err))
return err;
}
return count;
}
static inline int __vringh_push(struct vringh_access *acc,
const void *src, u32 len,
int (*getdesc)(struct vring_desc *dst,
const struct vring_desc *src,
void *arg),
int (*putmem)(void *dst, const void *src, size_t,
void *arg),
void *arg)
{
int err, done = 0;
while (len) {
void *dst;
u32 wlen;
/* Exhausted this descriptor? Read next. */
if (acc->desc.len == 0) {
if (!(acc->desc.flags & VRING_DESC_F_NEXT))
return done;
if (acc->desc.next >= acc->max) {
vringh_bad("Guest chained index %u > %u",
desc.next, acc->max);
return -EINVAL;
}
acc->head = acc->desc.next;
err = getdesc(&acc->desc, acc->start + acc->head, arg);
if (unlikely(err))
return err;
/* Non-writable descriptor out of order? */
if (unlikely(!(acc->desc.flags & VRING_DESC_F_WRITE)))
return -EINVAL;
}
if (len < acc->desc.len)
wlen = len;
else
wlen = acc->desc.len;
dst = (void *)(long)acc->desc.addr;
if (unlikely(putmem(dst, src, wlen, arg)))
return -EFAULT;
acc->desc.len -= wlen;
acc->desc.addr += wlen;
len -= wlen;
done += wlen;
}
return done;
}
/* Fill in iovec containing remaining writable descriptors, return num
* (can be no more than vring->num). */
static inline int __vringh_wiov(struct vringh_access *acc,
struct iovec wiov[],
int (*getdesc)(struct vring_desc *dst,
const struct vring_desc *src,
void *arg),
void *arg)
{
int err, count = 0;
while (!(acc->desc.flags & VRING_DESC_F_WRITE)) {
if (acc->desc.len != 0) {
if (count == acc->max)
return -ELOOP;
wiov->iov_base = (void * __user)(long)acc->desc.addr;
wiov->iov_len = acc->desc.len;
wiov++;
count++;
}
if (!(acc->desc.flags & VRING_DESC_F_NEXT))
break;
if (acc->desc.next >= acc->max)
return -EINVAL;
acc->head = acc->desc.next;
err = getdesc(&acc->desc, acc->start + acc->head, arg);
if (unlikely(err))
return err;
}
return count;
}
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2012-12-21 6:11 ` Rusty Russell
@ 2013-01-08 8:04 ` Sjur Brændeland
2013-01-08 23:17 ` Rusty Russell
0 siblings, 1 reply; 51+ messages in thread
From: Sjur Brændeland @ 2013-01-08 8:04 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Linus Walleij, virtualization
On Fri, Dec 21, 2012 at 7:11 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> On Wed, Dec 05, 2012 at 03:36:58PM +0100, Sjur Brændeland wrote:
>>> Feedback on this patch-set is appreciated, particularly on structure
>>> and code-reuse between vhost.c and the host-side virtio-queue.
>>> I'd also like some suggestions on how to handle the build configuration
>>> better - currently there are some unnecessary build dependencies.
>>
>> Rusty seems to disagree but one of the concerns people have about vhost
>> is security; so I value getting as much static checking as we can. This
>> discards __user annotations so this doesn't work for me.
>
> Sometimes, when we generalize code, we lose some type safety. Callbacks
> take void *, for example. And it happens *all the time* with const. We
> don't create a set of parallel const-safe routines.
>
> Extracting common code where it can be shared provides better, not worse
> security, because more people will read it. I've never audited the
> vhost code, for example.
>
> We already have a 'struct vring', we should just use that.
>
> I meant to do more work on this, but I've run out of time :( I was
> thinking of a drivers/virtio/virtio_ring.c, and in that we'd put the
> wrappers for vhost_net/blk, and for CAIF, etc. We could at least have a
> variant which did the __user etc thing, even if you have to pass in
> getdesc().
>
> getu16: get_user or assignment.
> getdesc: copy (and check, translate) the descriptor.
> getmem/putmem: memcpy or copy_to/from_user.
>
> (Completely untested, of course...)
>
> Thoughts?
Any thoughts on this Michael?
Regards,
Sjur
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-08 8:04 ` Sjur Brændeland
@ 2013-01-08 23:17 ` Rusty Russell
2013-01-10 10:30 ` Rusty Russell
0 siblings, 1 reply; 51+ messages in thread
From: Rusty Russell @ 2013-01-08 23:17 UTC (permalink / raw)
To: Sjur Brændeland, Michael S. Tsirkin; +Cc: Linus Walleij, virtualization
Sjur Brændeland <sjurbren@gmail.com> writes:
> On Fri, Dec 21, 2012 at 7:11 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>
>>> On Wed, Dec 05, 2012 at 03:36:58PM +0100, Sjur Brændeland wrote:
>>>> Feedback on this patch-set is appreciated, particularly on structure
>>>> and code-reuse between vhost.c and the host-side virtio-queue.
>>>> I'd also like some suggestions on how to handle the build configuration
>>>> better - currently there are some unnecessary build dependencies.
>>>
>>> Rusty seems to disagree but one of the concerns people have about vhost
>>> is security; so I value getting as much static checking as we can. This
>>> discards __user annotations so this doesn't work for me.
>>
>> Sometimes, when we generalize code, we lose some type safety. Callbacks
>> take void *, for example. And it happens *all the time* with const. We
>> don't create a set of parallel const-safe routines.
>>
>> Extracting common code where it can be shared provides better, not worse
>> security, because more people will read it. I've never audited the
>> vhost code, for example.
>>
>> We already have a 'struct vring', we should just use that.
>>
>> I meant to do more work on this, but I've run out of time :( I was
>> thinking of a drivers/virtio/virtio_ring.c, and in that we'd put the
>> wrappers for vhost_net/blk, and for CAIF, etc. We could at least have a
>> variant which did the __user etc thing, even if you have to pass in
>> getdesc().
>>
>> getu16: get_user or assignment.
>> getdesc: copy (and check, translate) the descriptor.
>> getmem/putmem: memcpy or copy_to/from_user.
>>
>> (Completely untested, of course...)
>>
>> Thoughts?
>
> Any thoughts on this Michael?
I'm actually testing code this time, and it's mutated a little.
It basically involves moving much of vring.c into a virtio_host.c: the
parts which actually touch the ring. Then it provides accessors for
vring.c to use which are __user-safe (all casts are inside
virtio_host.c).
I should have something to post by end of today, my time...
Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-08 23:17 ` Rusty Russell
@ 2013-01-10 10:30 ` Rusty Russell
2013-01-10 11:11 ` Michael S. Tsirkin
2013-01-10 18:39 ` Sjur Brændeland
0 siblings, 2 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-10 10:30 UTC (permalink / raw)
To: Sjur Brændeland, Michael S. Tsirkin
Cc: Linus Walleij, LKML, virtualization
Rusty Russell <rusty@rustcorp.com.au> writes:
> It basically involves moving much of vring.c into a virtio_host.c: the
> parts which actually touch the ring. Then it provides accessors for
> vring.c to use which are __user-safe (all casts are inside
> virtio_host.c).
>
> I should have something to post by end of today, my time...
Well, that was optimistic.
I now have some lightly-tested code (via a userspace harness). The
interface will probably change again as I try to adapt vhost to use it.
The emphasis is getting a sglist out of the vring as efficiently as
possible. This involves some hacks: I'm still wondering if we should
move the address mapping logic into the virtio_host core, with a callout
if an address we want is outside a single range.
Not sure why vring/net doesn't built a packet and feed it in
netif_rx_ni(). This is what tun seems to do, and with this code it
should be fairly optimal.
Cheers,
Rusty.
virtio_host: host-side implementation of virtio rings.
Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).
This patch attempts to abstract the business of dealing with the
virtio ring layout from the access (userspace or direct); to do this,
we use function pointers, which gcc inlines correctly.
The new API should be more efficient than the existing vhost code,
too, since we convert directly to chained sg lists, which can be
modified in place to map the pages.
Disadvantages:
1) The spec allows chained indirect entries, we don't. Noone does this,
but it's not as crazy as it sounds, so perhaps we should support it. If
we did, we'd almost certainly invoke an function ptr call to check the
validity of the indirect mem.
2) Getting an accessor is ugly; if it's indirect, the caller has to check
that it's valid. Efficient, but it's a horrible API.
No doubt this will change as I try to adapt existing vhost drivers.
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..38ec470 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
config VHOST_NET
tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
+ select VHOST
---help---
This kernel module can be loaded in host kernel to accelerate
guest networking with virtio_net. Not to be confused with virtio_net
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..f4c3704 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
config TCM_VHOST
tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+ select VHOST
default n
---help---
Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..fd95d3e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,12 @@ config VIRTIO
bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
CONFIG_RPMSG or CONFIG_S390_GUEST.
+config VHOST
+ tristate
+ ---help---
+ This option is selected by any driver which needs to access
+ the host side of a virtio ring.
+
menu "Virtio drivers"
config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..9833cd5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VHOST) += virtio_host.o
diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
new file mode 100644
index 0000000..169c8e2
--- /dev/null
+++ b/drivers/virtio/virtio_host.c
@@ -0,0 +1,668 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/virtio_host.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+
+/* An inline function, for easy cold marking. */
+static __cold bool __vringh_bad(void)
+{
+ static DEFINE_RATELIMIT_STATE(vringh_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ return __ratelimit(&vringh_rs);
+}
+
+#define vringh_bad(fmt, ...) \
+ do { if (__vringh_bad()) \
+ printk(KERN_NOTICE "vringh: " fmt "\n", __VA_ARGS__); \
+ } while(0)
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ u16 *last_avail_idx)
+{
+ u16 avail_idx, i, head;
+ int err;
+
+ err = getu16(&avail_idx, &vrh->vring.avail->idx);
+ if (err) {
+ vringh_bad("Failed to access avail idx at %p",
+ &vrh->vring.avail->idx);
+ return err;
+ }
+
+ err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to access last avail idx at %p",
+ &vring_avail_event(&vrh->vring));
+ return err;
+ }
+
+ if (*last_avail_idx == avail_idx)
+ return vrh->vring.num;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ smp_rmb();
+
+ i = *last_avail_idx & (vrh->vring.num - 1);
+
+ err = getu16(&head, &vrh->vring.avail->ring[i]);
+ if (err) {
+ vringh_bad("Failed to read head: idx %d address %p",
+ *last_avail_idx, &vrh->vring.avail->ring[i]);
+ return err;
+ }
+
+ if (head >= vrh->vring.num) {
+ vringh_bad("Guest says index %u > %u is available",
+ head, vrh->vring.num);
+ return -EINVAL;
+ }
+ return head;
+}
+
+/*
+ * Initialize the vringh_access structure for this head.
+ *
+ * For direct buffers, the range is simply the desc[] array in the vring.
+ *
+ * For indirect buffers, the range is the indirect entry; check() is called
+ * to validate this range.
+ *
+ * -error otherwise.
+ */
+static inline int __vringh_get_access(const struct vringh *vrh, u16 head,
+ int (*getdesc)(struct vring_desc *dst,
+ const struct vring_desc *),
+ struct vringh_acc *acc)
+{
+ int err;
+
+ acc->head = head;
+
+ err = getdesc(&acc->desc, &vrh->vring.desc[head]);
+ if (unlikely(err))
+ return err;
+
+ if (acc->desc.flags & VRING_DESC_F_INDIRECT) {
+ /* We don't support chained indirects. */
+ if (acc->desc.flags & VRING_DESC_F_NEXT)
+ return -EINVAL;
+ if (unlikely(acc->desc.len % sizeof(acc->desc)))
+ return -EINVAL;
+
+ acc->start = (void *)(long)acc->desc.addr;
+ acc->max = acc->desc.len / sizeof(acc->desc);
+
+ if (acc->max > vrh->vring.num)
+ return -EINVAL;
+
+ /* Force us to read first desc next time. */
+ acc->desc.len = 0;
+ acc->desc.next = 0;
+ acc->desc.flags = VRING_DESC_F_NEXT;
+ } else {
+ acc->start = vrh->vring.desc;
+ acc->max = vrh->vring.num;
+ acc->idx = head;
+ }
+ return 0;
+}
+
+/* Copy some bytes to/from the vring descriptor. Returns num copied. */
+static inline int vsg_xfer(struct vringh_sg **vsg,
+ unsigned int *num,
+ void *ptr, size_t len,
+ int (*xfer)(void *sgaddr, void *ptr, size_t len))
+{
+ int err, done = 0;
+
+ while (len && *num) {
+ size_t partlen;
+ struct scatterlist *sg = &(*vsg)->sg;
+
+ partlen = min(sg->length, len);
+ err = xfer(vringh_sg_addr(*vsg), ptr, partlen);
+ if (err)
+ return err;
+ sg->offset += partlen;
+ sg->length -= partlen;
+ len -= partlen;
+ done += partlen;
+ ptr += partlen;
+
+ if (sg->length == 0) {
+ *vsg = (struct vringh_sg *)sg_next(sg);
+ (*num)--;
+ }
+ }
+ return done;
+}
+
+static unsigned int rest_of_page(void *data)
+{
+ return PAGE_SIZE - ((unsigned long)data % PAGE_SIZE);
+}
+
+static struct vringh_sg *add_sg_chain(struct vringh_sg *end, gfp_t gfp)
+{
+ struct vringh_sg *vsg = (void *)__get_free_page(gfp);
+
+ if (!vsg)
+ return NULL;
+
+ sg_init_table(&vsg->sg, PAGE_SIZE / sizeof(*vsg));
+ sg_chain(&end->sg, 1, &vsg->sg);
+ return vsg;
+}
+
+/* We add a chain to the sg if we hit end: we're putting addresses in sg_page,
+ * as caller needs to map them itself. */
+static inline int add_to_sg(struct vringh_sg **vsg,
+ void *addr, u32 len, gfp_t gfp)
+{
+ int done = 0;
+
+ while (len) {
+ int partlen;
+ void *paddr;
+
+ paddr = (void *)((long)addr & PAGE_MASK);
+
+ if (unlikely(sg_is_last(&(*vsg)->sg))) {
+ *vsg = add_sg_chain(*vsg, gfp);
+ if (!*vsg)
+ return -ENOMEM;
+ }
+
+ partlen = rest_of_page(addr);
+ if (partlen > len)
+ partlen = len;
+ sg_set_page(&(*vsg)->sg, paddr, partlen, offset_in_page(addr));
+ (*vsg)++;
+ len -= partlen;
+ addr += partlen;
+ done++;
+ }
+ return done;
+}
+
+static inline int
+__vringh_sg(struct vringh_acc *acc,
+ struct vringh_sg *vsg,
+ unsigned max,
+ u16 write_flag,
+ gfp_t gfp,
+ int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
+{
+ unsigned count = 0, num_descs = 0;
+ struct vringh_sg *orig_vsg = vsg;
+ int err;
+
+ /* This sends end marker on sg[max-1], so we know when to chain. */
+ if (max)
+ sg_init_table(&vsg->sg, max);
+
+ for (;;) {
+ /* Exhausted this descriptor? Read next. */
+ if (acc->desc.len == 0) {
+ if (!(acc->desc.flags & VRING_DESC_F_NEXT))
+ break;
+
+ if (num_descs++ == acc->max) {
+ err = -ELOOP;
+ goto fail;
+ }
+
+ if (acc->desc.next >= acc->max) {
+ vringh_bad("Chained index %u > %u",
+ acc->desc.next, acc->max);
+ err = -EINVAL;
+ goto fail;
+ }
+
+ acc->idx = acc->desc.next;
+ err = getdesc(&acc->desc, acc->start + acc->idx);
+ if (unlikely(err))
+ goto fail;
+ }
+
+ if (unlikely(!max)) {
+ vringh_bad("Unexpected %s descriptor",
+ write_flag ? "writable" : "readable");
+ return -EINVAL;
+ }
+
+ /* No more readable/writable descriptors? */
+ if ((acc->desc.flags & VRING_DESC_F_WRITE) != write_flag) {
+ /* We should not have readable after writable */
+ if (write_flag) {
+ vringh_bad("Readable desc %p after writable",
+ acc->start + acc->idx);
+ err = -EINVAL;
+ goto fail;
+ }
+ break;
+ }
+
+ /* Append the pages into the sg. */
+ err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
+ acc->desc.len, gfp);
+ if (err < 0)
+ goto fail;
+ count += err;
+ acc->desc.len = 0;
+ }
+ if (count)
+ sg_mark_end(&vsg->sg);
+ return count;
+
+fail:
+ vringh_sg_free(orig_vsg);
+ return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh, u16 idx, u16 len,
+ int (*getu16)(u16 *val, const u16 *p),
+ int (*putu16)(u16 *p, u16 val),
+ int (*putused)(struct vring_used_elem *dst,
+ const struct vring_used_elem
+ *s),
+ bool *notify)
+{
+ struct vring_used_elem used;
+ struct vring_used *used_ring;
+ int err;
+ u16 used_idx, old, used_event;
+
+ used.id = idx;
+ used.len = len;
+
+ err = getu16(&used_idx, &vring_used_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to access used event %p",
+ &vring_used_event(&vrh->vring));
+ return err;
+ }
+
+ used_ring = vrh->vring.used;
+
+ err = putused(&used_ring->ring[used_idx % vrh->vring.num], &used);
+ if (err) {
+ vringh_bad("Failed to write used entry %u at %p",
+ used_idx % vrh->vring.num,
+ &used_ring->ring[used_idx % vrh->vring.num]);
+ return err;
+ }
+
+ /* Make sure buffer is written before we update index. */
+ smp_wmb();
+
+ old = vrh->last_used_idx;
+ vrh->last_used_idx++;
+
+ err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
+ if (err) {
+ vringh_bad("Failed to update used index at %p",
+ &vrh->vring.used->idx);
+ return err;
+ }
+
+ /* If we already know we need to notify, skip re-checking */
+ if (*notify)
+ return 0;
+
+ /* Flush out used index update. This is paired with the
+ * barrier that the Guest executes when enabling
+ * interrupts. */
+ smp_mb();
+
+ /* Old-style, without event indices. */
+ if (!vrh->event_indices) {
+ u16 flags;
+ err = getu16(&flags, &vrh->vring.avail->flags);
+ if (err) {
+ vringh_bad("Failed to get flags at %p",
+ &vrh->vring.avail->flags);
+ return err;
+ }
+ if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
+ *notify = true;
+ return 0;
+ }
+
+ /* Modern: we know where other side is up to. */
+ err = getu16(&used_event, &vring_used_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to get used event idx at %p",
+ &vring_used_event(&vrh->vring));
+ return err;
+ }
+ if (vring_need_event(used_event, vrh->last_used_idx, old))
+ *notify = true;
+ return 0;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ int (*putu16)(u16 *p, u16 val))
+{
+ u16 avail;
+
+ /* Already enabled? */
+ if (vrh->listening)
+ return false;
+
+ vrh->listening = true;
+
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, 0) != 0) {
+ vringh_bad("Clearing used flags %p",
+ &vrh->vring.used->flags);
+ return false;
+ }
+ } else {
+ if (putu16(&vring_avail_event(&vrh->vring),
+ vrh->last_avail_idx) != 0) {
+ vringh_bad("Updating avail event index %p",
+ &vring_avail_event(&vrh->vring));
+ return false;
+ }
+ }
+
+ /* They could have slipped one in as we were doing that: make
+ * sure it's written, then check again. */
+ smp_mb();
+
+ if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+ vringh_bad("Failed to check avail idx at %p",
+ &vrh->vring.avail->idx);
+ return false;
+ }
+
+ /* This is so unlikely, we just leave notifications enabled. */
+ return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+ int (*putu16)(u16 *p, u16 val))
+{
+ /* Already disabled? */
+ if (!vrh->listening)
+ return;
+
+ vrh->listening = false;
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+ vringh_bad("Setting used flags %p",
+ &vrh->vring.used->flags);
+ }
+ }
+}
+
+/* Userspace access helpers. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+ return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+ return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int getdesc_user(struct vring_desc *dst,
+ const struct vring_desc *src)
+{
+ return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
+ -EFAULT;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+ const struct vring_used_elem *s)
+{
+ return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
+ ? 0 : -EFAULT;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+ return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
+ -EFAULT;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+ return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
+ -EFAULT;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %zu", num);
+ return -EINVAL;
+ }
+
+ vrh->event_indices = (features & VIRTIO_RING_F_EVENT_IDX);
+ vrh->listening = false;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ vrh->vring.desc = (__force struct vring_desc *)desc;
+ vrh->vring.avail = (__force struct vring_avail *)avail;
+ vrh->vring.used = (__force struct vring_used *)used;
+ return 0;
+}
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @acc: the accessor structure to fill in.
+ *
+ * Returns 0 if it filled in @acc, or -errno. @acc->max is 0 if the ring is
+ * empty.
+ *
+ * Make sure you check that acc->start to acc->start + acc->max is
+ * valid memory!
+ */
+int vringh_getdesc_user(struct vringh *vrh, struct vringh_acc *acc)
+{
+ int err;
+
+ err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+ if (unlikely(err))
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num) {
+ acc->max = 0;
+ return 0;
+ }
+
+ return __vringh_get_access(vrh, err, getdesc_user, acc);
+}
+
+/**
+ * vringh_rsg_user - form an sg from the remaining readable bytes.
+ * @acc: the accessor from vringh_get_user.
+ * @sg: the scatterlist to populate
+ * @num: the number of elements in @sg
+ * @gfp: the allocation flags if we need to chain onto @sg.
+ *
+ * This puts the page addresses into @sg: not the struct pages! You must
+ * map the pages. It will allocate and chained sgs if required: in this
+ * case the return value will be >= num - 1, and vringh_sg_free()
+ * must be called to free the chained elements.
+ *
+ * You are expected to pull / rsg all readable bytes before accessing writable
+ * bytes.
+ *
+ * Returns -errno, or number of @sg elements created.
+ */
+int vringh_rsg_user(struct vringh_acc *acc,
+ struct vringh_sg *vsg, unsigned num, gfp_t gfp)
+{
+ return __vringh_sg(acc, vsg, num, 0, gfp, getdesc_user);
+}
+
+/**
+ * vringh_rsg_pull_user - copy bytes from vsg.
+ * @vsg: the vsg from vringh_rsg_user() (updated as we consume)
+ * @num: the number of elements in @vsg (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_rsg_pull_user(struct vringh_sg **vsg, unsigned *num,
+ void *dst, size_t len)
+{
+ return vsg_xfer(vsg, num, dst, len, xfer_from_user);
+}
+
+/**
+ * vringh_wsg_user - form an sg from the remaining writable bytes.
+ * @acc: the accessor from vringh_get_user.
+ * @sg: the scatterlist to populate
+ * @num: the number of elements in @sg
+ * @gfp: the allocation flags if we need to chain onto @sg.
+ *
+ * This puts the page addresses into @sg: not the struct pages! You must
+ * map the pages. It will allocate and chained sgs if required: in this
+ * case the return value will be >= num - 1, and vringh_sg_free()
+ * must be called to free the chained elements.
+ *
+ * You are expected to pull / rsg all readable bytes before calling this!
+ *
+ * Returns -errno, or number of @sg elements created.
+ */
+int vringh_wsg_user(struct vringh_acc *acc,
+ struct vringh_sg *vsg, unsigned num, gfp_t gfp)
+{
+ return __vringh_sg(acc, vsg, num, VRING_DESC_F_WRITE, gfp,
+ getdesc_user);
+}
+
+/**
+ * vringh_wsg_push_user - copy bytes to vsg.
+ * @vsg: the vsg from vringh_wsg_user() (updated as we consume)
+ * @num: the number of elements in @vsg (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_wsg_push_user(struct vringh_sg **vsg, unsigned *num,
+ const void *src, size_t len)
+{
+ return vsg_xfer(vsg, num, (void *)src, len, xfer_to_user);
+}
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ * vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+ /* We only update vring_avail_event(vr) when we want to be notified,
+ * so we haven't changed that yet. */
+ vrh->last_avail_idx -= num;
+}
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @acc: the accessor from vringh_get_user.
+ * @len: the length of data we have written.
+ * @notify: set if we should notify the other side, otherwise left alone.
+ */
+int vringh_complete_user(struct vringh *vrh,
+ const struct vringh_acc *acc,
+ u16 len,
+ bool *notify)
+{
+ return __vringh_complete(vrh, acc->head, len,
+ getu16_user, putu16_user, putused_user,
+ notify);
+}
+
+/**
+ * vringh_sg_free - free a chained sg.
+ * @vsg: the vsg from vringh_wsg_user/vringh_rsg_user
+ *
+ * If vringh_wsg_user/vringh_rsg_user chains your sg, you should call
+ * this to free it.
+ */
+void __cold vringh_sg_free(struct vringh_sg *vsg)
+{
+ struct scatterlist *next, *curr_start, *orig, *sg;
+
+ sg = &vsg->sg;
+ curr_start = orig = sg;
+
+ while (sg) {
+ next = sg_next(sg);
+ if (sg_is_chain(sg+1)) {
+ if (curr_start != orig)
+ free_page((long)curr_start);
+ curr_start = next;
+ }
+ sg = next;
+ }
+ if (curr_start != orig)
+ free_page((long)curr_start);
+}
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_user);
+}
diff --git a/include/linux/virtio_host.h b/include/linux/virtio_host.h
new file mode 100644
index 0000000..cb4b693
--- /dev/null
+++ b/include/linux/virtio_host.h
@@ -0,0 +1,136 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VIRTIO_HOST_H
+#define _LINUX_VIRTIO_HOST_H
+#include <uapi/linux/virtio_ring.h>
+#include <linux/scatterlist.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+ /* Guest publishes used event idx (note: we always do). */
+ bool event_indices;
+
+ /* Have we told the other end we want to be notified? */
+ bool listening;
+
+ /* Last available index we saw (ie. where we're up to). */
+ u16 last_avail_idx;
+
+ /* Last index we used. */
+ u16 last_used_idx;
+
+ /* The vring (note: it may contain user pointers!) */
+ struct vring vring;
+};
+
+/**
+ * struct vringh_sg - a scatterlist containing addresses.
+ *
+ * This data structure is trivially mapped in-place to a real sg, but
+ * the method is best left to the users (they may have to map user
+ * pages and add offsets to addresses).
+ */
+struct vringh_sg {
+ struct scatterlist sg;
+} __packed;
+
+static inline void *vringh_sg_addr(const struct vringh_sg *vsg)
+{
+ return (void *)sg_page((struct scatterlist *)&vsg->sg) + vsg->sg.offset;
+}
+
+/* Accessor structure for a single descriptor. */
+struct vringh_acc {
+ /* Start address. */
+ struct vring_desc *start;
+
+ /* Maximum number of entries, <= ring size. */
+ u32 max;
+
+ /* Head index we got, for vringh_complete_user, and current index. */
+ u16 head, idx;
+
+ /* Cached descriptor. */
+ struct vring_desc desc;
+};
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used);
+
+/* Get accessor to userspace vring: make sure start to start+max is valid! */
+int vringh_getdesc_user(struct vringh *vrh, struct vringh_acc *acc);
+
+/* Fetch readable descriptor in vsg (num == 0 gives error if any). */
+int vringh_rsg_user(struct vringh_acc *acc,
+ struct vringh_sg *vsg, unsigned num, gfp_t gfp);
+
+/* Then fetch writable descriptor in sg (num == 0 gives error if any). */
+int vringh_wsg_user(struct vringh_acc *acc,
+ struct vringh_sg *vsg, unsigned num, gfp_t gfp);
+
+/* Copy bytes from readable vsg, consuming it. */
+ssize_t vringh_rsg_pull_user(struct vringh_sg **vsg, unsigned *num,
+ void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it. */
+ssize_t vringh_rsg_push_user(struct vringh_sg **vsg, unsigned *num,
+ const void *src, size_t len);
+
+/* Unmap all the pages mapped in this sg. */
+void vringh_unmap_pages(struct scatterlist *sg, unsigned num);
+
+/* Map a vring_sg, turning it into a real sg. */
+static inline struct scatterlist *vringh_sg_map(struct vringh_sg *vsg,
+ unsigned num,
+ struct page *(*map)(void *addr))
+{
+ struct scatterlist *orig_sg = (struct scatterlist *)vsg, *sg;
+ int i;
+
+ for_each_sg(orig_sg, sg, num, i) {
+ struct page *p = map(sg_page(sg));
+ if (unlikely(IS_ERR(p))) {
+ vringh_unmap_pages(orig_sg, i);
+ return (struct scatterlist *)p;
+ }
+ }
+ return orig_sg;
+}
+
+/* If wsg or rsg returns > num - 1, call this to free sg chains. */
+void vringh_sg_free(struct vringh_sg *sg);
+
+/* Mark a descriptor as used. Sets notify if you should fire eventfd. */
+int vringh_complete_user(struct vringh *vrh,
+ const struct vringh_acc *acc,
+ u16 len,
+ bool *notify);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+#endif /* _LINUX_VIRTIO_HOST_H */
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 10:30 ` Rusty Russell
@ 2013-01-10 11:11 ` Michael S. Tsirkin
2013-01-10 22:48 ` Rusty Russell
2013-01-10 18:39 ` Sjur Brændeland
1 sibling, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2013-01-10 11:11 UTC (permalink / raw)
To: Rusty Russell; +Cc: LKML, Linus Walleij, virtualization
On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
> Not sure why vring/net doesn't built a packet and feed it in
> netif_rx_ni(). This is what tun seems to do, and with this code it
> should be fairly optimal.
Because we want to use NAPI.
--
MST
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 10:30 ` Rusty Russell
2013-01-10 11:11 ` Michael S. Tsirkin
@ 2013-01-10 18:39 ` Sjur Brændeland
2013-01-10 23:35 ` Rusty Russell
1 sibling, 1 reply; 51+ messages in thread
From: Sjur Brændeland @ 2013-01-10 18:39 UTC (permalink / raw)
To: Rusty Russell
Cc: Michael S. Tsirkin, Linus Walleij, LKML, virtualization,
Sjur Brændeland
Hi Rusty,
On Thu, Jan 10, 2013 at 11:30 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
...
>I now have some lightly-tested code (via a userspace harness).
Great - thank you for looking into this. I will start integrating this
with my patches
when you send out a proper patch.
...
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..fd95d3e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,12 @@ config VIRTIO
> bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
> CONFIG_RPMSG or CONFIG_S390_GUEST.
>
> +config VHOST
> + tristate
Inclusion of drivers/virtio from drivers/Makefile depends on VIRTIO.
So I guess VHOST should select VIRTIO to ensure that
drivers/virtio/virtio_host.c
is part of the build.
> + ---help---
> + This option is selected by any driver which needs to access
> + the host side of a virtio ring.
> +
...
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
> + u16 *last_avail_idx)
> +{
> + u16 avail_idx, i, head;
> + int err;
> +
> + err = getu16(&avail_idx, &vrh->vring.avail->idx);
> + if (err) {
> + vringh_bad("Failed to access avail idx at %p",
> + &vrh->vring.avail->idx);
> + return err;
> + }
> +
> + err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to access last avail idx at %p",
> + &vring_avail_event(&vrh->vring));
> + return err;
> + }
> +
> + if (*last_avail_idx == avail_idx)
> + return vrh->vring.num;
> +
> + /* Only get avail ring entries after they have been exposed by guest. */
> + smp_rmb();
We are accessing memory shared with a remote device (modem), so we probably
need mandatory barriers here, e.g. something like the virtio_rmb
defined in virtio_ring.c.
> +
> + i = *last_avail_idx & (vrh->vring.num - 1);
> +
> + err = getu16(&head, &vrh->vring.avail->ring[i]);
> + if (err) {
> + vringh_bad("Failed to read head: idx %d address %p",
> + *last_avail_idx, &vrh->vring.avail->ring[i]);
> + return err;
> + }
> +
> + if (head >= vrh->vring.num) {
> + vringh_bad("Guest says index %u > %u is available",
> + head, vrh->vring.num);
> + return -EINVAL;
> + }
> + return head;
> +}
...
> +static inline int
> +__vringh_sg(struct vringh_acc *acc,
> + struct vringh_sg *vsg,
> + unsigned max,
> + u16 write_flag,
> + gfp_t gfp,
> + int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> +{
> + unsigned count = 0, num_descs = 0;
> + struct vringh_sg *orig_vsg = vsg;
> + int err;
> +
> + /* This sends end marker on sg[max-1], so we know when to chain. */
> + if (max)
> + sg_init_table(&vsg->sg, max);
> +
> + for (;;) {
> + /* Exhausted this descriptor? Read next. */
> + if (acc->desc.len == 0) {
> + if (!(acc->desc.flags & VRING_DESC_F_NEXT))
> + break;
> +
> + if (num_descs++ == acc->max) {
> + err = -ELOOP;
> + goto fail;
> + }
> +
> + if (acc->desc.next >= acc->max) {
> + vringh_bad("Chained index %u > %u",
> + acc->desc.next, acc->max);
> + err = -EINVAL;
> + goto fail;
> + }
> +
> + acc->idx = acc->desc.next;
> + err = getdesc(&acc->desc, acc->start + acc->idx);
> + if (unlikely(err))
> + goto fail;
> + }
> +
> + if (unlikely(!max)) {
> + vringh_bad("Unexpected %s descriptor",
> + write_flag ? "writable" : "readable");
> + return -EINVAL;
> + }
> +
> + /* No more readable/writable descriptors? */
> + if ((acc->desc.flags & VRING_DESC_F_WRITE) != write_flag) {
> + /* We should not have readable after writable */
> + if (write_flag) {
> + vringh_bad("Readable desc %p after writable",
> + acc->start + acc->idx);
> + err = -EINVAL;
> + goto fail;
> + }
> + break;
> + }
> +
> + /* Append the pages into the sg. */
> + err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
> + acc->desc.len, gfp);
I would prefer not to split into pages at this point, but rather provide an
iterator or the original list found in the descriptor to the client.
In our case we use virtio rings to talk to a LTE-modem over shared memory.
The IP traffic is received over the air, interleaved and arrives in
the virtio driver in
large bursts. So virtio driver on the modem receives multiple datagrams
held in large contiguous buffers. Our current approach is to handle each
buffer as a chained descriptor list, where each datagram is kept in
separate chained descriptors. When the buffers are consumed on the linux
host, the modem will read the chained descriptors from the used-ring and
free the entire contiguous buffer in one operation.
So I would prefer if we could avoid this approach of splitting buffers
received in
the ring into multiple sg-list entries as this would break the current
CAIF virtio
implementation.
Regards,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 11:11 ` Michael S. Tsirkin
@ 2013-01-10 22:48 ` Rusty Russell
2013-01-11 7:31 ` Michael S. Tsirkin
[not found] ` <20130111073155.GA13315@redhat.com>
0 siblings, 2 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-10 22:48 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, Linus Walleij, virtualization
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
>> Not sure why vhost/net doesn't built a packet and feed it in
>> netif_rx_ni(). This is what tun seems to do, and with this code it
>> should be fairly optimal.
>
> Because we want to use NAPI.
Not quite what I was asking; it was more a question of why we're using a
raw socket, when we trivially have a complete skb already which we
should be able to feed to Linux like any network packet.
And that path is pretty well optimized...
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 18:39 ` Sjur Brændeland
@ 2013-01-10 23:35 ` Rusty Russell
2013-01-11 6:37 ` Rusty Russell
2013-01-11 14:52 ` Sjur Brændeland
0 siblings, 2 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-10 23:35 UTC (permalink / raw)
To: Sjur Brændeland
Cc: Michael S. Tsirkin, Linus Walleij, LKML, virtualization,
Sjur Brændeland
Sjur Brændeland <sjurbren@gmail.com> writes:
> Hi Rusty,
>
> On Thu, Jan 10, 2013 at 11:30 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> ...
>>I now have some lightly-tested code (via a userspace harness).
>
> Great - thank you for looking into this. I will start integrating this
> with my patches
> when you send out a proper patch.
Hi Sjur!
OK, the Internet was no help here, how do you pronounce Sjur?
I'm guessing "shoor" rhyming with tour until I know better.
>> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
>> index 8d5bddb..fd95d3e 100644
>> --- a/drivers/virtio/Kconfig
>> +++ b/drivers/virtio/Kconfig
>> @@ -5,6 +5,12 @@ config VIRTIO
>> bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
>> CONFIG_RPMSG or CONFIG_S390_GUEST.
>>
>> +config VHOST
>> + tristate
>
> Inclusion of drivers/virtio from drivers/Makefile depends on VIRTIO.
> So I guess VHOST should select VIRTIO to ensure that
> drivers/virtio/virtio_host.c
> is part of the build.
Maybe I should move drivers/virtio/virtio_host.c to
drivers/vhost/vringh.c; I'll look at it.
It makes sense for vhost/ to contain the host-side stuff, since it
already exists.
>> + if (*last_avail_idx == avail_idx)
>> + return vrh->vring.num;
>> +
>> + /* Only get avail ring entries after they have been exposed by guest. */
>> + smp_rmb();
>
> We are accessing memory shared with a remote device (modem), so we probably
> need mandatory barriers here, e.g. something like the virtio_rmb
> defined in virtio_ring.c.
Fair enough, we can put those in a header.
>> + /* Append the pages into the sg. */
>> + err = add_to_sg(&vsg, (void *)(long)acc->desc.addr,
>> + acc->desc.len, gfp);
>
> I would prefer not to split into pages at this point, but rather provide an
> iterator or the original list found in the descriptor to the client.
>
> In our case we use virtio rings to talk to a LTE-modem over shared memory.
> The IP traffic is received over the air, interleaved and arrives in
> the virtio driver in
> large bursts. So virtio driver on the modem receives multiple datagrams
> held in large contiguous buffers. Our current approach is to handle each
> buffer as a chained descriptor list, where each datagram is kept in
> separate chained descriptors. When the buffers are consumed on the linux
> host, the modem will read the chained descriptors from the used-ring and
> free the entire contiguous buffer in one operation.
In other words, boundaries matter?
While the sg-in-place hack is close to optimal for TCM_VHOST, neither
net not you can use it directly. I'll switch to an iovec (with a
similar use-caller-supplied-if-it-fits trick); they're smaller anyway.
More code coming...
Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 23:35 ` Rusty Russell
@ 2013-01-11 6:37 ` Rusty Russell
2013-01-11 15:02 ` Sjur Brændeland
2013-01-14 17:39 ` Michael S. Tsirkin
2013-01-11 14:52 ` Sjur Brændeland
1 sibling, 2 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-11 6:37 UTC (permalink / raw)
To: Sjur Brændeland
Cc: Michael S. Tsirkin, Linus Walleij, LKML, virtualization,
Sjur Brændeland
Untested, but I wanted to post before the weekend.
I think the implementation is a bit nicer, and though we have a callback
to get the guest-to-userspace offset, it might be faster since I think
most cases will re-use the same mapping.
Feedback on API welcome!
Rusty.
virtio_host: host-side implementation of virtio rings (untested!)
Getting use of virtio rings correct is tricky, and a recent patch saw
an implementation of in-kernel rings (as separate from userspace).
This patch attempts to abstract the business of dealing with the
virtio ring layout from the access (userspace or direct); to do this,
we use function pointers, which gcc inlines correctly.
FIXME: strong barriers a-la virtio weak_barrier flag.
FIXME: separate notify call with flag if we wrapped.
FIXME: move to vhost/vringh.c.
FIXME: test :)
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..38ec470 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,7 @@
config VHOST_NET
tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
+ select VHOST
---help---
This kernel module can be loaded in host kernel to accelerate
guest networking with virtio_net. Not to be confused with virtio_net
diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
index a9c6f76..f4c3704 100644
--- a/drivers/vhost/Kconfig.tcm
+++ b/drivers/vhost/Kconfig.tcm
@@ -1,6 +1,7 @@
config TCM_VHOST
tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
+ select VHOST
default n
---help---
Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 8d5bddb..fd95d3e 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -5,6 +5,12 @@ config VIRTIO
bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
CONFIG_RPMSG or CONFIG_S390_GUEST.
+config VHOST
+ tristate
+ ---help---
+ This option is selected by any driver which needs to access
+ the host side of a virtio ring.
+
menu "Virtio drivers"
config VIRTIO_PCI
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 9076635..9833cd5 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
+obj-$(CONFIG_VHOST) += virtio_host.o
diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
new file mode 100644
index 0000000..7416741
--- /dev/null
+++ b/drivers/virtio/virtio_host.c
@@ -0,0 +1,618 @@
+/*
+ * Helpers for the host side of a virtio ring.
+ *
+ * Since these may be in userspace, we use (inline) accessors.
+ */
+#include <linux/virtio_host.h>
+#include <linux/kernel.h>
+#include <linux/ratelimit.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+
+static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
+{
+ static DEFINE_RATELIMIT_STATE(vringh_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ if (__ratelimit(&vringh_rs)) {
+ va_list ap;
+ va_start(ap, fmt);
+ printk(KERN_NOTICE "vringh:");
+ vprintk(fmt, ap);
+ va_end(ap);
+ }
+}
+
+/* Returns vring->num if empty, -ve on error. */
+static inline int __vringh_get_head(const struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ u16 *last_avail_idx)
+{
+ u16 avail_idx, i, head;
+ int err;
+
+ err = getu16(&avail_idx, &vrh->vring.avail->idx);
+ if (err) {
+ vringh_bad("Failed to access avail idx at %p",
+ &vrh->vring.avail->idx);
+ return err;
+ }
+
+ err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to access last avail idx at %p",
+ &vring_avail_event(&vrh->vring));
+ return err;
+ }
+
+ if (*last_avail_idx == avail_idx)
+ return vrh->vring.num;
+
+ /* Only get avail ring entries after they have been exposed by guest. */
+ smp_rmb();
+
+ i = *last_avail_idx & (vrh->vring.num - 1);
+
+ err = getu16(&head, &vrh->vring.avail->ring[i]);
+ if (err) {
+ vringh_bad("Failed to read head: idx %d address %p",
+ *last_avail_idx, &vrh->vring.avail->ring[i]);
+ return err;
+ }
+
+ if (head >= vrh->vring.num) {
+ vringh_bad("Guest says index %u > %u is available",
+ head, vrh->vring.num);
+ return -EINVAL;
+ }
+ return head;
+}
+
+/* Copy some bytes to/from the iovec. Returns num copied. */
+static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
+ void *ptr, size_t len,
+ int (*xfer)(void __user *addr, void *ptr,
+ size_t len))
+{
+ int err, done = 0;
+
+ while (len && iov->i < iov->max) {
+ size_t partlen;
+
+ partlen = min(iov->iov[iov->i].iov_len, len);
+ err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
+ if (err)
+ return err;
+ done += partlen;
+ iov->iov[iov->i].iov_base += partlen;
+ iov->iov[iov->i].iov_len -= partlen;
+
+ if (iov->iov[iov->i].iov_len == 0)
+ iov->i++;
+ }
+ return done;
+}
+
+static inline bool check_range(u64 addr, u32 len,
+ struct vringh_range *range,
+ bool (*getrange)(u64, struct vringh_range *))
+{
+ if (addr < range->start || addr > range->end_incl) {
+ if (!getrange(addr, range))
+ goto bad;
+ }
+ BUG_ON(addr < range->start || addr > range->end_incl);
+
+ /* To end of memory? */
+ if (unlikely(addr + len == 0)) {
+ if (range->end_incl == -1ULL)
+ return true;
+ goto bad;
+ }
+
+ /* Otherwise, don't wrap. */
+ if (unlikely(addr + len < addr))
+ goto bad;
+ if (unlikely(addr + len > range->end_incl))
+ goto bad;
+ return true;
+
+bad:
+ vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
+ return false;
+}
+
+/* No reason for this code to be inline. */
+static int move_to_indirect(int *up_next, u16 *i, void *addr,
+ const struct vring_desc *desc,
+ struct vring_desc **descs, int *desc_max)
+{
+ /* Indirect tables can't have indirect. */
+ if (*up_next != -1) {
+ vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
+ return -EINVAL;
+ }
+
+ if (unlikely(desc->len % sizeof(struct vring_desc))) {
+ vringh_bad("Strange indirect len %u", desc->len);
+ return -EINVAL;
+ }
+
+ /* We will check this when we follow it! */
+ if (desc->flags & VRING_DESC_F_NEXT)
+ *up_next = desc->next;
+ else
+ *up_next = -2;
+ *descs = addr;
+ *desc_max = desc->len / sizeof(struct vring_desc);
+
+ /* Now, start at the first indirect. */
+ *i = 0;
+ return 0;
+}
+
+static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
+{
+ struct iovec *new;
+ unsigned int new_num = iov->max * 2;
+
+ if (new_num < 8)
+ new_num = 8;
+
+ if (iov->allocated)
+ new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
+ else {
+ new = kmalloc(new_num * sizeof(struct iovec), gfp);
+ if (new) {
+ memcpy(new, iov->iov, iov->i * sizeof(struct iovec));
+ iov->allocated = true;
+ }
+ }
+ if (!new)
+ return -ENOMEM;
+ iov->iov = new;
+ iov->max = new_num;
+ return 0;
+}
+
+static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
+ struct vring_desc **descs, int *desc_max)
+{
+ u16 i = *up_next;
+
+ *up_next = -1;
+ *descs = vrh->vring.desc;
+ *desc_max = vrh->vring.num;
+ return i;
+}
+
+static inline int
+__vringh_iov(struct vringh *vrh, u16 i,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(u64 addr, struct vringh_range *r),
+ gfp_t gfp,
+ int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
+{
+ int err, count = 0, up_next, desc_max;
+ struct vring_desc desc, *descs;
+ struct vringh_range range = { -1ULL, 0 };
+
+ /* We start traversing vring's descriptor table. */
+ descs = vrh->vring.desc;
+ desc_max = vrh->vring.num;
+ up_next = -1;
+
+ riov->i = wiov->i = 0;
+ for (;;) {
+ void *addr;
+ struct vringh_iov *iov;
+
+ err = getdesc(&desc, &descs[i]);
+ if (unlikely(err))
+ goto fail;
+
+ /* Make sure it's OK, and get offset. */
+ if (!check_range(desc.addr, desc.len, &range, getrange)) {
+ err = -EINVAL;
+ goto fail;
+ }
+ addr = (void *)(long)desc.addr + range.offset;
+
+ if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
+ err = move_to_indirect(&up_next, &i, addr, &desc,
+ &descs, &desc_max);
+ if (err)
+ goto fail;
+ continue;
+ }
+
+ if (desc.flags & VRING_DESC_F_WRITE)
+ iov = wiov;
+ else {
+ iov = riov;
+ if (unlikely(wiov->i)) {
+ vringh_bad("Readable desc %p after writable",
+ &descs[i]);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ if (unlikely(iov->i == iov->max)) {
+ err = resize_iovec(iov, gfp);
+ if (err)
+ goto fail;
+ }
+
+ iov->iov[iov->i].iov_base = (__force __user void *)addr;
+ iov->iov[iov->i].iov_len = desc.len;
+ iov->i++;
+
+ if (++count == vrh->vring.num) {
+ vringh_bad("Descriptor loop in %p", descs);
+ err = -ELOOP;
+ goto fail;
+ }
+
+ if (desc.flags & VRING_DESC_F_NEXT) {
+ i = desc.next;
+ } else {
+ /* Just in case we need to finish traversing above. */
+ if (unlikely(up_next > 0))
+ i = return_from_indirect(vrh, &up_next,
+ &descs, &desc_max);
+ else
+ break;
+ }
+
+ if (i >= desc_max) {
+ vringh_bad("Chained index %u > %u", i, desc_max);
+ err = -EINVAL;
+ goto fail;
+ }
+ }
+
+ /* Reset for fresh iteration. */
+ riov->i = wiov->i = 0;
+ return 0;
+
+fail:
+ if (riov->allocated)
+ kfree(riov->iov);
+ if (wiov->allocated)
+ kfree(wiov->iov);
+ return err;
+}
+
+static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
+ int (*getu16)(u16 *val, const u16 *p),
+ int (*putu16)(u16 *p, u16 val),
+ int (*putused)(struct vring_used_elem *dst,
+ const struct vring_used_elem
+ *s),
+ bool *notify)
+{
+ struct vring_used_elem used;
+ struct vring_used *used_ring;
+ int err;
+ u16 used_idx, old, used_event;
+
+ used.id = idx;
+ used.len = len;
+
+ err = getu16(&used_idx, &vring_used_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to access used event %p",
+ &vring_used_event(&vrh->vring));
+ return err;
+ }
+
+ used_ring = vrh->vring.used;
+
+ err = putused(&used_ring->ring[used_idx % vrh->vring.num], &used);
+ if (err) {
+ vringh_bad("Failed to write used entry %u at %p",
+ used_idx % vrh->vring.num,
+ &used_ring->ring[used_idx % vrh->vring.num]);
+ return err;
+ }
+
+ /* Make sure buffer is written before we update index. */
+ smp_wmb();
+
+ old = vrh->last_used_idx;
+ vrh->last_used_idx++;
+
+ err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
+ if (err) {
+ vringh_bad("Failed to update used index at %p",
+ &vrh->vring.used->idx);
+ return err;
+ }
+
+ /* If we already know we need to notify, skip re-checking */
+ if (*notify)
+ return 0;
+
+ /* Flush out used index update. This is paired with the
+ * barrier that the Guest executes when enabling
+ * interrupts. */
+ smp_mb();
+
+ /* Old-style, without event indices. */
+ if (!vrh->event_indices) {
+ u16 flags;
+ err = getu16(&flags, &vrh->vring.avail->flags);
+ if (err) {
+ vringh_bad("Failed to get flags at %p",
+ &vrh->vring.avail->flags);
+ return err;
+ }
+ if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
+ *notify = true;
+ return 0;
+ }
+
+ /* Modern: we know where other side is up to. */
+ err = getu16(&used_event, &vring_used_event(&vrh->vring));
+ if (err) {
+ vringh_bad("Failed to get used event idx at %p",
+ &vring_used_event(&vrh->vring));
+ return err;
+ }
+ if (vring_need_event(used_event, vrh->last_used_idx, old))
+ *notify = true;
+ return 0;
+}
+
+static inline bool __vringh_notify_enable(struct vringh *vrh,
+ int (*getu16)(u16 *val, const u16 *p),
+ int (*putu16)(u16 *p, u16 val))
+{
+ u16 avail;
+
+ /* Already enabled? */
+ if (vrh->listening)
+ return false;
+
+ vrh->listening = true;
+
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, 0) != 0) {
+ vringh_bad("Clearing used flags %p",
+ &vrh->vring.used->flags);
+ return false;
+ }
+ } else {
+ if (putu16(&vring_avail_event(&vrh->vring),
+ vrh->last_avail_idx) != 0) {
+ vringh_bad("Updating avail event index %p",
+ &vring_avail_event(&vrh->vring));
+ return false;
+ }
+ }
+
+ /* They could have slipped one in as we were doing that: make
+ * sure it's written, then check again. */
+ smp_mb();
+
+ if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
+ vringh_bad("Failed to check avail idx at %p",
+ &vrh->vring.avail->idx);
+ return false;
+ }
+
+ /* This is so unlikely, we just leave notifications enabled. */
+ return avail != vrh->last_avail_idx;
+}
+
+static inline void __vringh_notify_disable(struct vringh *vrh,
+ int (*putu16)(u16 *p, u16 val))
+{
+ /* Already disabled? */
+ if (!vrh->listening)
+ return;
+
+ vrh->listening = false;
+ if (!vrh->event_indices) {
+ /* Old-school; update flags. */
+ if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
+ vringh_bad("Setting used flags %p",
+ &vrh->vring.used->flags);
+ }
+ }
+}
+
+/* Userspace access helpers. */
+static inline int getu16_user(u16 *val, const u16 *p)
+{
+ return get_user(*val, (__force u16 __user *)p);
+}
+
+static inline int putu16_user(u16 *p, u16 val)
+{
+ return put_user(val, (__force u16 __user *)p);
+}
+
+static inline int getdesc_user(struct vring_desc *dst,
+ const struct vring_desc *src)
+{
+ return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
+ -EFAULT;
+}
+
+static inline int putused_user(struct vring_used_elem *dst,
+ const struct vring_used_elem *s)
+{
+ return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
+ ? 0 : -EFAULT;
+}
+
+static inline int xfer_from_user(void *src, void *dst, size_t len)
+{
+ return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
+ -EFAULT;
+}
+
+static inline int xfer_to_user(void *dst, void *src, size_t len)
+{
+ return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
+ -EFAULT;
+}
+
+/**
+ * vringh_init_user - initialize a vringh for a userspace vring.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid: you should check pointers
+ * yourself!
+ */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used)
+{
+ /* Sane power of 2 please! */
+ if (!num || num > 0xffff || (num & (num - 1))) {
+ vringh_bad("Bad ring size %zu", num);
+ return -EINVAL;
+ }
+
+ vrh->event_indices = (features & VIRTIO_RING_F_EVENT_IDX);
+ vrh->listening = false;
+ vrh->last_avail_idx = 0;
+ vrh->last_used_idx = 0;
+ vrh->vring.num = num;
+ vrh->vring.desc = (__force struct vring_desc *)desc;
+ vrh->vring.avail = (__force struct vring_avail *)avail;
+ vrh->vring.used = (__force struct vring_used *)used;
+ return 0;
+}
+
+/**
+ * vringh_getdesc_user - get next available descriptor from userspace ring.
+ * @vrh: the userspace vring.
+ * @riov: where to put the readable descriptors.
+ * @wiov: where to put the writable descriptors.
+ * @getrange: function to call to check ranges.
+ * @head: head index we received, for passing to vringh_complete_user().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * If it returns 1, riov->allocated and wiov->allocated indicate if you
+ * have to kfree riov->iov and wiov->iov respectively.
+ */
+int vringh_getdesc_user(struct vringh *vrh,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(u64 addr, struct vringh_range *r),
+ u16 *head,
+ gfp_t gfp)
+{
+ int err;
+
+ err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
+ if (err < 0)
+ return err;
+
+ /* Empty... */
+ if (err == vrh->vring.num)
+ return 0;
+
+ *head = err;
+ err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
+ if (err)
+ return err;
+
+ return 1;
+}
+
+/**
+ * vringh_iov_pull_user - copy bytes from vring_iov.
+ * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
+{
+ return vringh_iov_xfer(riov, dst, len, xfer_from_user);
+}
+
+/**
+ * vringh_iov_push_user - copy bytes into vring_iov.
+ * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ const void *src, size_t len)
+{
+ return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
+}
+
+/**
+ * vringh_abandon_user - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ * vringh_get_user() to undo).
+ *
+ * The next vringh_get_user() will return the old descriptor(s) again.
+ */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num)
+{
+ /* We only update vring_avail_event(vr) when we want to be notified,
+ * so we haven't changed that yet. */
+ vrh->last_avail_idx -= num;
+}
+
+/**
+ * vringh_complete_user - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_user.
+ * @len: the length of data we have written.
+ * @notify: set if we should notify the other side, otherwise left alone.
+ */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
+ bool *notify)
+{
+ return __vringh_complete(vrh, head, len,
+ getu16_user, putu16_user, putused_user,
+ notify);
+}
+
+/**
+ * vringh_notify_enable_user - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns true if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_user(struct vringh *vrh)
+{
+ return __vringh_notify_enable(vrh, getu16_user, putu16_user);
+}
+
+/**
+ * vringh_notify_disable_user - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_user(struct vringh *vrh)
+{
+ __vringh_notify_disable(vrh, putu16_user);
+}
diff --git a/include/linux/virtio_host.h b/include/linux/virtio_host.h
new file mode 100644
index 0000000..07bb4f6
--- /dev/null
+++ b/include/linux/virtio_host.h
@@ -0,0 +1,88 @@
+/*
+ * Linux host-side vring helpers; for when the kernel needs to access
+ * someone else's vring.
+ *
+ * Copyright IBM Corporation, 2013.
+ * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Written by: Rusty Russell <rusty@rustcorp.com.au>
+ */
+#ifndef _LINUX_VIRTIO_HOST_H
+#define _LINUX_VIRTIO_HOST_H
+#include <uapi/linux/virtio_ring.h>
+#include <uapi/linux/uio.h>
+
+/* virtio_ring with information needed for host access. */
+struct vringh {
+ /* Guest publishes used event idx (note: we always do). */
+ bool event_indices;
+
+ /* Have we told the other end we want to be notified? */
+ bool listening;
+
+ /* Last available index we saw (ie. where we're up to). */
+ u16 last_avail_idx;
+
+ /* Last index we used. */
+ u16 last_used_idx;
+
+ /* The vring (note: it may contain user pointers!) */
+ struct vring vring;
+};
+
+/* The memory the vring can access, and what offset to apply. */
+struct vringh_range {
+ u64 start, end_incl;
+ u64 offset;
+};
+
+/* All the information about an iovec. */
+struct vringh_iov {
+ struct iovec *iov;
+ unsigned i, max;
+ bool allocated;
+};
+
+/* Helpers for userspace vrings. */
+int vringh_init_user(struct vringh *vrh, u32 features,
+ unsigned int num,
+ struct vring_desc __user *desc,
+ struct vring_avail __user *avail,
+ struct vring_used __user *used);
+
+/* Convert a descriptor into iovecs. */
+int vringh_getdesc_user(struct vringh *vrh,
+ struct vringh_iov *riov,
+ struct vringh_iov *wiov,
+ bool (*getrange)(u64 addr, struct vringh_range *r),
+ u16 *head,
+ gfp_t gfp);
+
+/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
+
+/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
+ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ const void *src, size_t len);
+
+/* Mark a descriptor as used. Sets notify if you should fire eventfd. */
+int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
+ bool *notify);
+
+/* Pretend we've never seen descriptor (for easy error handling). */
+void vringh_abandon_user(struct vringh *vrh, unsigned int num);
+#endif /* _LINUX_VIRTIO_HOST_H */
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 22:48 ` Rusty Russell
@ 2013-01-11 7:31 ` Michael S. Tsirkin
[not found] ` <20130111073155.GA13315@redhat.com>
1 sibling, 0 replies; 51+ messages in thread
From: Michael S. Tsirkin @ 2013-01-11 7:31 UTC (permalink / raw)
To: Rusty Russell; +Cc: netdev, Linus Walleij, virtualization
On Fri, Jan 11, 2013 at 09:18:33AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
> >> Not sure why vhost/net doesn't built a packet and feed it in
> >> netif_rx_ni(). This is what tun seems to do, and with this code it
> >> should be fairly optimal.
> >
> > Because we want to use NAPI.
>
> Not quite what I was asking; it was more a question of why we're using a
> raw socket, when we trivially have a complete skb already which we
> should be able to feed to Linux like any network packet.
>
> And that path is pretty well optimized...
>
> Cheers,
> Rusty.
Oh for some reason I thought you were talking about virtio.
I don't really understand what you are saying here - vhost
actually calls out to tun to build and submit the skb.
--
MST
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-10 23:35 ` Rusty Russell
2013-01-11 6:37 ` Rusty Russell
@ 2013-01-11 14:52 ` Sjur Brændeland
1 sibling, 0 replies; 51+ messages in thread
From: Sjur Brændeland @ 2013-01-11 14:52 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, virtualization, LKML, Michael S. Tsirkin
On Fri, Jan 11, 2013 at 12:35 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> Hi Sjur!
>
> OK, the Internet was no help here, how do you pronounce Sjur?
> I'm guessing "shoor" rhyming with tour until I know better.
Thank you for asking! This is pretty close yes.
I usually tell people to pronounce it like "sure" for simplicity.
But Google translate has a perfect Norwegian voice pronouncing my name:
http://translate.google.com/#auto/no/sjur (Press the speaker button)
While at the subject: Norwegian names can be a laugh: I have people
named "Odd" and "Even" in my family, and I once had a friend named
"Aashold".
...
>> I would prefer not to split into pages at this point, but rather provide an
>> iterator or the original list found in the descriptor to the client.
...
> In other words, boundaries matter?
>
> While the sg-in-place hack is close to optimal for TCM_VHOST, neither
> net not you can use it directly. I'll switch to an iovec (with a
> similar use-caller-supplied-if-it-fits trick); they're smaller anyway.
Great, I think iovec will be a good fit for my CAIF driver.
Thanks,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-11 6:37 ` Rusty Russell
@ 2013-01-11 15:02 ` Sjur Brændeland
2013-01-12 0:26 ` Rusty Russell
2013-01-14 17:39 ` Michael S. Tsirkin
1 sibling, 1 reply; 51+ messages in thread
From: Sjur Brændeland @ 2013-01-11 15:02 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, virtualization, LKML, Michael S. Tsirkin
On Fri, Jan 11, 2013 at 7:37 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>virtio_host: host-side implementation of virtio rings (untested!)
>
>Getting use of virtio rings correct is tricky, and a recent patch saw
>an implementation of in-kernel rings (as separate from userspace).
How do you see the in-kernel API for this? I would like to see
something similar to my previous patches, where we extend
the virtqueue API. E.g. something like this:
struct virtqueue *vring_new_virtqueueh(unsigned int index,
unsigned int num,
unsigned int vring_align,
struct virtio_device *vdev,
bool weak_barriers,
void *pages,
void (*notify)(struct virtqueue *),
void (*callback)(struct virtqueue *),
const char *name);
int virtqueueh_get_iov(struct virtqueue *vqh,
struct vringh_iov *riov,
struct vringh_iov *wiov,
gfp_t gfp);
int virtqueueh_add_iov(struct virtqueue *vqh,
struct vringh_iov *riov,
struct vringh_iov *wiov);
I guess implementation of the host-virtqueue should stay in drivers/virtio?
Regards,
Sjur
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
[not found] ` <20130111073155.GA13315@redhat.com>
@ 2013-01-12 0:20 ` Rusty Russell
2013-01-14 16:54 ` Michael S. Tsirkin
0 siblings, 1 reply; 51+ messages in thread
From: Rusty Russell @ 2013-01-12 0:20 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, Linus Walleij, virtualization
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Fri, Jan 11, 2013 at 09:18:33AM +1030, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> > On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
>> >> Not sure why vhost/net doesn't built a packet and feed it in
>> >> netif_rx_ni(). This is what tun seems to do, and with this code it
>> >> should be fairly optimal.
>> >
>> > Because we want to use NAPI.
>>
>> Not quite what I was asking; it was more a question of why we're using a
>> raw socket, when we trivially have a complete skb already which we
>> should be able to feed to Linux like any network packet.
>
> Oh for some reason I thought you were talking about virtio.
> I don't really understand what you are saying here - vhost
> actually calls out to tun to build and submit the skb.
Ah, the fd is tun? Seems a bit indirect; I wonder if there's room for
more optimization here...
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-11 15:02 ` Sjur Brændeland
@ 2013-01-12 0:26 ` Rusty Russell
0 siblings, 0 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-12 0:26 UTC (permalink / raw)
To: Sjur Brændeland
Cc: Linus Walleij, virtualization, LKML, Michael S. Tsirkin
Sjur Brændeland <sjurbren@gmail.com> writes:
> How do you see the in-kernel API for this? I would like to see
> something similar to my previous patches, where we extend
> the virtqueue API. E.g. something like this:
> struct virtqueue *vring_new_virtqueueh(unsigned int index,
> unsigned int num,
> unsigned int vring_align,
> struct virtio_device *vdev,
> bool weak_barriers,
> void *pages,
> void (*notify)(struct virtqueue *),
> void (*callback)(struct virtqueue *),
> const char *name);
I was just going to create _kernel variants of all the _user helpers,
and let you drive it directly like that.
If we get a second in-kernel user, we create wrappers (I'd prefer not to
overload struct virtqueue though).
Cheers,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-12 0:20 ` Rusty Russell
@ 2013-01-14 16:54 ` Michael S. Tsirkin
0 siblings, 0 replies; 51+ messages in thread
From: Michael S. Tsirkin @ 2013-01-14 16:54 UTC (permalink / raw)
To: Rusty Russell; +Cc: netdev, Linus Walleij, virtualization
On Sat, Jan 12, 2013 at 10:50:30AM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Fri, Jan 11, 2013 at 09:18:33AM +1030, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> > On Thu, Jan 10, 2013 at 09:00:55PM +1030, Rusty Russell wrote:
> >> >> Not sure why vhost/net doesn't built a packet and feed it in
> >> >> netif_rx_ni(). This is what tun seems to do, and with this code it
> >> >> should be fairly optimal.
> >> >
> >> > Because we want to use NAPI.
> >>
> >> Not quite what I was asking; it was more a question of why we're using a
> >> raw socket, when we trivially have a complete skb already which we
> >> should be able to feed to Linux like any network packet.
> >
> > Oh for some reason I thought you were talking about virtio.
> > I don't really understand what you are saying here - vhost
> > actually calls out to tun to build and submit the skb.
>
> Ah, the fd is tun?
It can be tun or macvtap. We also support a packet socket
backend though I don't know of any users, maybe this can
be dropped.
> Seems a bit indirect; I wonder if there's room for
> more optimization here...
>
> Cheers,
> Rusty.
Quite possibly. Using common data structures and code in tun and macvtap
would allow calling this code directly from vhost-net.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-11 6:37 ` Rusty Russell
2013-01-11 15:02 ` Sjur Brændeland
@ 2013-01-14 17:39 ` Michael S. Tsirkin
2013-01-16 3:13 ` Rusty Russell
1 sibling, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2013-01-14 17:39 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
On Fri, Jan 11, 2013 at 05:07:44PM +1030, Rusty Russell wrote:
> Untested, but I wanted to post before the weekend.
>
> I think the implementation is a bit nicer, and though we have a callback
> to get the guest-to-userspace offset, it might be faster since I think
> most cases will re-use the same mapping.
>
> Feedback on API welcome!
> Rusty.
>
> virtio_host: host-side implementation of virtio rings (untested!)
>
> Getting use of virtio rings correct is tricky, and a recent patch saw
> an implementation of in-kernel rings (as separate from userspace).
>
> This patch attempts to abstract the business of dealing with the
> virtio ring layout from the access (userspace or direct); to do this,
> we use function pointers, which gcc inlines correctly.
>
> FIXME: strong barriers a-la virtio weak_barrier flag.
> FIXME: separate notify call with flag if we wrapped.
> FIXME: move to vhost/vringh.c.
> FIXME: test :)
>
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index 202bba6..38ec470 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -1,6 +1,7 @@
> config VHOST_NET
> tristate "Host kernel accelerator for virtio net (EXPERIMENTAL)"
> depends on NET && EVENTFD && (TUN || !TUN) && (MACVTAP || !MACVTAP) && EXPERIMENTAL
> + select VHOST
> ---help---
> This kernel module can be loaded in host kernel to accelerate
> guest networking with virtio_net. Not to be confused with virtio_net
> diff --git a/drivers/vhost/Kconfig.tcm b/drivers/vhost/Kconfig.tcm
> index a9c6f76..f4c3704 100644
> --- a/drivers/vhost/Kconfig.tcm
> +++ b/drivers/vhost/Kconfig.tcm
> @@ -1,6 +1,7 @@
> config TCM_VHOST
> tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
> depends on TARGET_CORE && EVENTFD && EXPERIMENTAL && m
> + select VHOST
> default n
> ---help---
> Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 8d5bddb..fd95d3e 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -5,6 +5,12 @@ config VIRTIO
> bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
> CONFIG_RPMSG or CONFIG_S390_GUEST.
>
> +config VHOST
> + tristate
> + ---help---
> + This option is selected by any driver which needs to access
> + the host side of a virtio ring.
> +
> menu "Virtio drivers"
>
> config VIRTIO_PCI
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 9076635..9833cd5 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -2,3 +2,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
> obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
> obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> +obj-$(CONFIG_VHOST) += virtio_host.o
> diff --git a/drivers/virtio/virtio_host.c b/drivers/virtio/virtio_host.c
> new file mode 100644
> index 0000000..7416741
> --- /dev/null
> +++ b/drivers/virtio/virtio_host.c
> @@ -0,0 +1,618 @@
> +/*
> + * Helpers for the host side of a virtio ring.
> + *
> + * Since these may be in userspace, we use (inline) accessors.
> + */
> +#include <linux/virtio_host.h>
> +#include <linux/kernel.h>
> +#include <linux/ratelimit.h>
> +#include <linux/uaccess.h>
> +#include <linux/slab.h>
> +
> +static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
> +{
> + static DEFINE_RATELIMIT_STATE(vringh_rs,
> + DEFAULT_RATELIMIT_INTERVAL,
> + DEFAULT_RATELIMIT_BURST);
> + if (__ratelimit(&vringh_rs)) {
> + va_list ap;
> + va_start(ap, fmt);
> + printk(KERN_NOTICE "vringh:");
> + vprintk(fmt, ap);
> + va_end(ap);
> + }
> +}
> +
> +/* Returns vring->num if empty, -ve on error. */
> +static inline int __vringh_get_head(const struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
> + u16 *last_avail_idx)
> +{
> + u16 avail_idx, i, head;
> + int err;
> +
> + err = getu16(&avail_idx, &vrh->vring.avail->idx);
> + if (err) {
> + vringh_bad("Failed to access avail idx at %p",
> + &vrh->vring.avail->idx);
> + return err;
> + }
> +
> + err = getu16(last_avail_idx, &vring_avail_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to access last avail idx at %p",
> + &vring_avail_event(&vrh->vring));
> + return err;
> + }
> +
> + if (*last_avail_idx == avail_idx)
> + return vrh->vring.num;
> +
> + /* Only get avail ring entries after they have been exposed by guest. */
> + smp_rmb();
> +
> + i = *last_avail_idx & (vrh->vring.num - 1);
> +
> + err = getu16(&head, &vrh->vring.avail->ring[i]);
> + if (err) {
> + vringh_bad("Failed to read head: idx %d address %p",
> + *last_avail_idx, &vrh->vring.avail->ring[i]);
> + return err;
> + }
> +
> + if (head >= vrh->vring.num) {
> + vringh_bad("Guest says index %u > %u is available",
> + head, vrh->vring.num);
> + return -EINVAL;
> + }
> + return head;
> +}
> +
> +/* Copy some bytes to/from the iovec. Returns num copied. */
> +static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
> + void *ptr, size_t len,
> + int (*xfer)(void __user *addr, void *ptr,
> + size_t len))
> +{
> + int err, done = 0;
> +
> + while (len && iov->i < iov->max) {
> + size_t partlen;
> +
> + partlen = min(iov->iov[iov->i].iov_len, len);
> + err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
> + if (err)
> + return err;
> + done += partlen;
> + iov->iov[iov->i].iov_base += partlen;
> + iov->iov[iov->i].iov_len -= partlen;
> +
> + if (iov->iov[iov->i].iov_len == 0)
> + iov->i++;
> + }
> + return done;
> +}
> +
> +static inline bool check_range(u64 addr, u32 len,
> + struct vringh_range *range,
> + bool (*getrange)(u64, struct vringh_range *))
> +{
> + if (addr < range->start || addr > range->end_incl) {
> + if (!getrange(addr, range))
> + goto bad;
> + }
> + BUG_ON(addr < range->start || addr > range->end_incl);
> +
> + /* To end of memory? */
> + if (unlikely(addr + len == 0)) {
> + if (range->end_incl == -1ULL)
> + return true;
> + goto bad;
> + }
> +
> + /* Otherwise, don't wrap. */
> + if (unlikely(addr + len < addr))
> + goto bad;
> + if (unlikely(addr + len > range->end_incl))
> + goto bad;
> + return true;
> +
> +bad:
> + vringh_bad("Malformed descriptor address %u@0x%llx", len, addr);
> + return false;
> +}
> +
> +/* No reason for this code to be inline. */
> +static int move_to_indirect(int *up_next, u16 *i, void *addr,
> + const struct vring_desc *desc,
> + struct vring_desc **descs, int *desc_max)
> +{
> + /* Indirect tables can't have indirect. */
> + if (*up_next != -1) {
> + vringh_bad("Multilevel indirect %u->%u", *up_next, *i);
> + return -EINVAL;
> + }
> +
> + if (unlikely(desc->len % sizeof(struct vring_desc))) {
> + vringh_bad("Strange indirect len %u", desc->len);
> + return -EINVAL;
> + }
> +
> + /* We will check this when we follow it! */
> + if (desc->flags & VRING_DESC_F_NEXT)
> + *up_next = desc->next;
> + else
> + *up_next = -2;
> + *descs = addr;
> + *desc_max = desc->len / sizeof(struct vring_desc);
> +
> + /* Now, start at the first indirect. */
> + *i = 0;
> + return 0;
> +}
> +
> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> +{
> + struct iovec *new;
> + unsigned int new_num = iov->max * 2;
We must limit this I think, this is coming
from userspace. How about UIO_MAXIOV?
> +
> + if (new_num < 8)
> + new_num = 8;
> +
> + if (iov->allocated)
> + new = krealloc(iov->iov, new_num * sizeof(struct iovec), gfp);
> + else {
> + new = kmalloc(new_num * sizeof(struct iovec), gfp);
> + if (new) {
> + memcpy(new, iov->iov, iov->i * sizeof(struct iovec));
> + iov->allocated = true;
> + }
> + }
> + if (!new)
> + return -ENOMEM;
> + iov->iov = new;
> + iov->max = new_num;
> + return 0;
> +}
> +
> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
> + struct vring_desc **descs, int *desc_max)
Not sure it should be cold like that - virtio net uses indirect on data
path.
> +{
> + u16 i = *up_next;
> +
> + *up_next = -1;
> + *descs = vrh->vring.desc;
> + *desc_max = vrh->vring.num;
> + return i;
> +}
> +
> +static inline int
> +__vringh_iov(struct vringh *vrh, u16 i,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + gfp_t gfp,
> + int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
> +{
> + int err, count = 0, up_next, desc_max;
> + struct vring_desc desc, *descs;
> + struct vringh_range range = { -1ULL, 0 };
> +
> + /* We start traversing vring's descriptor table. */
> + descs = vrh->vring.desc;
> + desc_max = vrh->vring.num;
> + up_next = -1;
> +
> + riov->i = wiov->i = 0;
> + for (;;) {
> + void *addr;
> + struct vringh_iov *iov;
> +
> + err = getdesc(&desc, &descs[i]);
> + if (unlikely(err))
> + goto fail;
> +
> + /* Make sure it's OK, and get offset. */
> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
> + err = -EINVAL;
> + goto fail;
> + }
Hmm this looks like it will translate and
validate immediate descriptors same way as indirect ones.
vhost-net has different translation for regular descriptors
and indirect ones, both for speed and to allow ring aliasing,
so it has to know which is which.
> + addr = (void *)(long)desc.addr + range.offset;
I really dislike raw pointers that we must never dereference.
Since we are forcing everything to __user anyway, why don't we
tag all addresses as __user? The kernel users of this API
can cast that away, this will keep the casts to minimum.
Failing that, we can add our own class
# define __virtio __attribute__((noderef, address_space(2)))
> +
> + if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
> + err = move_to_indirect(&up_next, &i, addr, &desc,
> + &descs, &desc_max);
> + if (err)
> + goto fail;
> + continue;
> + }
> +
> + if (desc.flags & VRING_DESC_F_WRITE)
> + iov = wiov;
> + else {
> + iov = riov;
> + if (unlikely(wiov->i)) {
> + vringh_bad("Readable desc %p after writable",
> + &descs[i]);
> + err = -EINVAL;
> + goto fail;
> + }
> + }
> +
> + if (unlikely(iov->i == iov->max)) {
> + err = resize_iovec(iov, gfp);
> + if (err)
> + goto fail;
> + }
> +
> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> + iov->iov[iov->i].iov_len = desc.len;
> + iov->i++;
This looks like it won't do the right thing if desc.len spans multiple
ranges. I don't know if this happens in practice but this is something
vhost supports ATM.
> +
> + if (++count == vrh->vring.num) {
> + vringh_bad("Descriptor loop in %p", descs);
> + err = -ELOOP;
> + goto fail;
> + }
> +
> + if (desc.flags & VRING_DESC_F_NEXT) {
> + i = desc.next;
> + } else {
> + /* Just in case we need to finish traversing above. */
> + if (unlikely(up_next > 0))
> + i = return_from_indirect(vrh, &up_next,
> + &descs, &desc_max);
> + else
> + break;
> + }
> +
> + if (i >= desc_max) {
> + vringh_bad("Chained index %u > %u", i, desc_max);
> + err = -EINVAL;
> + goto fail;
> + }
> + }
> +
> + /* Reset for fresh iteration. */
> + riov->i = wiov->i = 0;
> + return 0;
> +
> +fail:
> + if (riov->allocated)
> + kfree(riov->iov);
> + if (wiov->allocated)
> + kfree(wiov->iov);
> + return err;
> +}
> +
> +static inline int __vringh_complete(struct vringh *vrh, u16 idx, u32 len,
> + int (*getu16)(u16 *val, const u16 *p),
> + int (*putu16)(u16 *p, u16 val),
> + int (*putused)(struct vring_used_elem *dst,
> + const struct vring_used_elem
> + *s),
> + bool *notify)
> +{
> + struct vring_used_elem used;
> + struct vring_used *used_ring;
> + int err;
> + u16 used_idx, old, used_event;
> +
> + used.id = idx;
> + used.len = len;
> +
> + err = getu16(&used_idx, &vring_used_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to access used event %p",
> + &vring_used_event(&vrh->vring));
> + return err;
> + }
> +
> + used_ring = vrh->vring.used;
> +
> + err = putused(&used_ring->ring[used_idx % vrh->vring.num], &used);
> + if (err) {
> + vringh_bad("Failed to write used entry %u at %p",
> + used_idx % vrh->vring.num,
> + &used_ring->ring[used_idx % vrh->vring.num]);
> + return err;
> + }
> +
> + /* Make sure buffer is written before we update index. */
> + smp_wmb();
> +
> + old = vrh->last_used_idx;
> + vrh->last_used_idx++;
> +
> + err = putu16(&vrh->vring.used->idx, vrh->last_used_idx);
> + if (err) {
> + vringh_bad("Failed to update used index at %p",
> + &vrh->vring.used->idx);
> + return err;
> + }
> +
> + /* If we already know we need to notify, skip re-checking */
> + if (*notify)
> + return 0;
> +
> + /* Flush out used index update. This is paired with the
> + * barrier that the Guest executes when enabling
> + * interrupts. */
> + smp_mb();
> +
> + /* Old-style, without event indices. */
> + if (!vrh->event_indices) {
> + u16 flags;
> + err = getu16(&flags, &vrh->vring.avail->flags);
> + if (err) {
> + vringh_bad("Failed to get flags at %p",
> + &vrh->vring.avail->flags);
> + return err;
> + }
> + if (!(flags & VRING_AVAIL_F_NO_INTERRUPT))
> + *notify = true;
> + return 0;
> + }
> +
> + /* Modern: we know where other side is up to. */
> + err = getu16(&used_event, &vring_used_event(&vrh->vring));
> + if (err) {
> + vringh_bad("Failed to get used event idx at %p",
> + &vring_used_event(&vrh->vring));
> + return err;
> + }
> + if (vring_need_event(used_event, vrh->last_used_idx, old))
> + *notify = true;
> + return 0;
> +}
> +
> +static inline bool __vringh_notify_enable(struct vringh *vrh,
> + int (*getu16)(u16 *val, const u16 *p),
> + int (*putu16)(u16 *p, u16 val))
> +{
> + u16 avail;
> +
> + /* Already enabled? */
> + if (vrh->listening)
> + return false;
> +
> + vrh->listening = true;
> +
> + if (!vrh->event_indices) {
> + /* Old-school; update flags. */
> + if (putu16(&vrh->vring.used->flags, 0) != 0) {
> + vringh_bad("Clearing used flags %p",
> + &vrh->vring.used->flags);
> + return false;
> + }
> + } else {
> + if (putu16(&vring_avail_event(&vrh->vring),
> + vrh->last_avail_idx) != 0) {
> + vringh_bad("Updating avail event index %p",
> + &vring_avail_event(&vrh->vring));
> + return false;
> + }
> + }
> +
> + /* They could have slipped one in as we were doing that: make
> + * sure it's written, then check again. */
> + smp_mb();
> +
> + if (getu16(&avail, &vrh->vring.avail->idx) != 0) {
> + vringh_bad("Failed to check avail idx at %p",
> + &vrh->vring.avail->idx);
> + return false;
> + }
> +
> + /* This is so unlikely, we just leave notifications enabled. */
> + return avail != vrh->last_avail_idx;
> +}
> +
> +static inline void __vringh_notify_disable(struct vringh *vrh,
> + int (*putu16)(u16 *p, u16 val))
> +{
> + /* Already disabled? */
> + if (!vrh->listening)
> + return;
> +
> + vrh->listening = false;
> + if (!vrh->event_indices) {
> + /* Old-school; update flags. */
> + if (putu16(&vrh->vring.used->flags, VRING_USED_F_NO_NOTIFY)) {
> + vringh_bad("Setting used flags %p",
> + &vrh->vring.used->flags);
> + }
> + }
> +}
> +
> +/* Userspace access helpers. */
> +static inline int getu16_user(u16 *val, const u16 *p)
> +{
> + return get_user(*val, (__force u16 __user *)p);
> +}
> +
> +static inline int putu16_user(u16 *p, u16 val)
> +{
> + return put_user(val, (__force u16 __user *)p);
> +}
> +
> +static inline int getdesc_user(struct vring_desc *dst,
> + const struct vring_desc *src)
> +{
> + return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +static inline int putused_user(struct vring_used_elem *dst,
> + const struct vring_used_elem *s)
> +{
> + return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
> + ? 0 : -EFAULT;
> +}
> +
> +static inline int xfer_from_user(void *src, void *dst, size_t len)
> +{
> + return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +static inline int xfer_to_user(void *dst, void *src, size_t len)
> +{
> + return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
> + -EFAULT;
> +}
> +
> +/**
> + * vringh_init_user - initialize a vringh for a userspace vring.
> + * @vrh: the vringh to initialize.
> + * @features: the feature bits for this ring.
> + * @num: the number of elements.
> + * @desc: the userpace descriptor pointer.
> + * @avail: the userpace avail pointer.
> + * @used: the userpace used pointer.
> + *
> + * Returns an error if num is invalid: you should check pointers
> + * yourself!
> + */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> + unsigned int num,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used)
> +{
> + /* Sane power of 2 please! */
> + if (!num || num > 0xffff || (num & (num - 1))) {
> + vringh_bad("Bad ring size %zu", num);
> + return -EINVAL;
> + }
> +
> + vrh->event_indices = (features & VIRTIO_RING_F_EVENT_IDX);
> + vrh->listening = false;
> + vrh->last_avail_idx = 0;
> + vrh->last_used_idx = 0;
> + vrh->vring.num = num;
> + vrh->vring.desc = (__force struct vring_desc *)desc;
> + vrh->vring.avail = (__force struct vring_avail *)avail;
> + vrh->vring.used = (__force struct vring_used *)used;
> + return 0;
> +}
> +
> +/**
> + * vringh_getdesc_user - get next available descriptor from userspace ring.
> + * @vrh: the userspace vring.
> + * @riov: where to put the readable descriptors.
> + * @wiov: where to put the writable descriptors.
> + * @getrange: function to call to check ranges.
> + * @head: head index we received, for passing to vringh_complete_user().
> + * @gfp: flags for allocating larger riov/wiov.
> + *
> + * Returns 0 if there was no descriptor, 1 if there was, or -errno.
> + *
> + * If it returns 1, riov->allocated and wiov->allocated indicate if you
> + * have to kfree riov->iov and wiov->iov respectively.
> + */
> +int vringh_getdesc_user(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + u16 *head,
> + gfp_t gfp)
> +{
> + int err;
> +
> + err = __vringh_get_head(vrh, getu16_user, &vrh->last_avail_idx);
> + if (err < 0)
> + return err;
> +
> + /* Empty... */
> + if (err == vrh->vring.num)
> + return 0;
> +
> + *head = err;
> + err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
> + if (err)
> + return err;
> +
> + return 1;
> +}
> +
> +/**
> + * vringh_iov_pull_user - copy bytes from vring_iov.
> + * @riov: the riov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
> +{
> + return vringh_iov_xfer(riov, dst, len, xfer_from_user);
> +}
> +
> +/**
> + * vringh_iov_push_user - copy bytes into vring_iov.
> + * @wiov: the wiov as passed to vringh_getdesc_user() (updated as we consume)
> + * @dst: the place to copy.
> + * @len: the maximum length to copy.
> + *
> + * Returns the bytes copied <= len or a negative errno.
> + */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len)
> +{
> + return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
> +}
> +
> +/**
> + * vringh_abandon_user - we've decided not to handle the descriptor(s).
> + * @vrh: the vring.
> + * @num: the number of descriptors to put back (ie. num
> + * vringh_get_user() to undo).
> + *
> + * The next vringh_get_user() will return the old descriptor(s) again.
> + */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num)
> +{
> + /* We only update vring_avail_event(vr) when we want to be notified,
> + * so we haven't changed that yet. */
> + vrh->last_avail_idx -= num;
> +}
> +
> +/**
> + * vringh_complete_user - we've finished with descriptor, publish it.
> + * @vrh: the vring.
> + * @head: the head as filled in by vringh_getdesc_user.
> + * @len: the length of data we have written.
> + * @notify: set if we should notify the other side, otherwise left alone.
> + */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> + bool *notify)
> +{
> + return __vringh_complete(vrh, head, len,
> + getu16_user, putu16_user, putused_user,
> + notify);
> +}
> +
> +/**
> + * vringh_notify_enable_user - we want to know if something changes.
> + * @vrh: the vring.
> + *
> + * This always enables notifications, but returns true if there are
> + * now more buffers available in the vring.
> + */
> +bool vringh_notify_enable_user(struct vringh *vrh)
> +{
> + return __vringh_notify_enable(vrh, getu16_user, putu16_user);
> +}
> +
> +/**
> + * vringh_notify_disable_user - don't tell us if something changes.
> + * @vrh: the vring.
> + *
> + * This is our normal running state: we disable and then only enable when
> + * we're going to sleep.
> + */
> +void vringh_notify_disable_user(struct vringh *vrh)
> +{
> + __vringh_notify_disable(vrh, putu16_user);
> +}
> diff --git a/include/linux/virtio_host.h b/include/linux/virtio_host.h
> new file mode 100644
> index 0000000..07bb4f6
> --- /dev/null
> +++ b/include/linux/virtio_host.h
> @@ -0,0 +1,88 @@
> +/*
> + * Linux host-side vring helpers; for when the kernel needs to access
> + * someone else's vring.
> + *
> + * Copyright IBM Corporation, 2013.
> + * Parts taken from drivers/vhost/vhost.c Copyright 2009 Red Hat, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Written by: Rusty Russell <rusty@rustcorp.com.au>
> + */
> +#ifndef _LINUX_VIRTIO_HOST_H
> +#define _LINUX_VIRTIO_HOST_H
> +#include <uapi/linux/virtio_ring.h>
> +#include <uapi/linux/uio.h>
> +
> +/* virtio_ring with information needed for host access. */
> +struct vringh {
> + /* Guest publishes used event idx (note: we always do). */
> + bool event_indices;
> +
> + /* Have we told the other end we want to be notified? */
> + bool listening;
> +
> + /* Last available index we saw (ie. where we're up to). */
> + u16 last_avail_idx;
> +
> + /* Last index we used. */
> + u16 last_used_idx;
> +
> + /* The vring (note: it may contain user pointers!) */
> + struct vring vring;
> +};
> +
> +/* The memory the vring can access, and what offset to apply. */
> +struct vringh_range {
> + u64 start, end_incl;
> + u64 offset;
> +};
> +
> +/* All the information about an iovec. */
> +struct vringh_iov {
> + struct iovec *iov;
> + unsigned i, max;
> + bool allocated;
MAybe set iov = NULL when not allocated?
> +};
> +
> +/* Helpers for userspace vrings. */
> +int vringh_init_user(struct vringh *vrh, u32 features,
> + unsigned int num,
> + struct vring_desc __user *desc,
> + struct vring_avail __user *avail,
> + struct vring_used __user *used);
> +
> +/* Convert a descriptor into iovecs. */
> +int vringh_getdesc_user(struct vringh *vrh,
> + struct vringh_iov *riov,
> + struct vringh_iov *wiov,
> + bool (*getrange)(u64 addr, struct vringh_range *r),
> + u16 *head,
> + gfp_t gfp);
> +
> +/* Copy bytes from readable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len);
> +
> +/* Copy bytes into writable vsg, consuming it (and incrementing wiov->i). */
> +ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
> + const void *src, size_t len);
> +
> +/* Mark a descriptor as used. Sets notify if you should fire eventfd. */
> +int vringh_complete_user(struct vringh *vrh, u16 head, u32 len,
> + bool *notify);
> +
> +/* Pretend we've never seen descriptor (for easy error handling). */
> +void vringh_abandon_user(struct vringh *vrh, unsigned int num);
> +#endif /* _LINUX_VIRTIO_HOST_H */
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-14 17:39 ` Michael S. Tsirkin
@ 2013-01-16 3:13 ` Rusty Russell
2013-01-16 8:16 ` Michael S. Tsirkin
0 siblings, 1 reply; 51+ messages in thread
From: Rusty Russell @ 2013-01-16 3:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
>> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
>> +{
>> + struct iovec *new;
>> + unsigned int new_num = iov->max * 2;
>
> We must limit this I think, this is coming
> from userspace. How about UIO_MAXIOV?
We limit it to the ring size already; UIO_MAXIOV is a weird choice here.
>> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
>> + struct vring_desc **descs, int *desc_max)
>
> Not sure it should be cold like that - virtio net uses indirect on data
> path.
This is only when we have a chained, indirect descriptor (ie. we have to
go back up to the next entry in the main descriptor table). That's
allowed in the spec, but noone does it.
>> + /* Make sure it's OK, and get offset. */
>> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
>> + err = -EINVAL;
>> + goto fail;
>> + }
>
> Hmm this looks like it will translate and
> validate immediate descriptors same way as indirect ones.
> vhost-net has different translation for regular descriptors
> and indirect ones, both for speed and to allow ring aliasing,
> so it has to know which is which.
I see translate_desc() in both cases, what's different?
>> + addr = (void *)(long)desc.addr + range.offset;
>
> I really dislike raw pointers that we must never dereference.
> Since we are forcing everything to __user anyway, why don't we
> tag all addresses as __user? The kernel users of this API
> can cast that away, this will keep the casts to minimum.
>
> Failing that, we can add our own class
> # define __virtio __attribute__((noderef, address_space(2)))
In this case, perhaps we should leave addr as a u64?
>> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> + iov->iov[iov->i].iov_len = desc.len;
>> + iov->i++;
>
>
> This looks like it won't do the right thing if desc.len spans multiple
> ranges. I don't know if this happens in practice but this is something
> vhost supports ATM.
Well, kind of. I assumed that the bool (*getrange)(u64, struct
vringh_range *)) callback would meld any adjacent ranges if it needs to.
>> +/* All the information about an iovec. */
>> +struct vringh_iov {
>> + struct iovec *iov;
>> + unsigned i, max;
>> + bool allocated;
>
> MAybe set iov = NULL when not allocated?
The idea was that iov points to the initial (on-stack?) iov, for the
fast path.
I'm writing a more complete test at the moment, then I will look at how
this fits with vhost.c as it stands...
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-16 3:13 ` Rusty Russell
@ 2013-01-16 8:16 ` Michael S. Tsirkin
2013-01-17 2:10 ` Rusty Russell
[not found] ` <87k3rcy2y2.fsf@rustcorp.com.au>
0 siblings, 2 replies; 51+ messages in thread
From: Michael S. Tsirkin @ 2013-01-16 8:16 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> >> +{
> >> + struct iovec *new;
> >> + unsigned int new_num = iov->max * 2;
> >
> > We must limit this I think, this is coming
> > from userspace. How about UIO_MAXIOV?
>
> We limit it to the ring size already;
1. do we limit it in case there's a loop in the descriptor ring?
2. do we limit it in case there are indirect descriptors?
I guess I missed where we do this could you point this out to me?
> UIO_MAXIOV is a weird choice here.
It's kind of forced by the need to pass the iov on to the linux kernel,
so we know that any guest using more is broken on existing hypervisors.
Ring size is somewhat arbitrary too, isn't it? A huge ring where we
post lots of short descriptors (e.g. RX buffers) seems like a valid thing to do.
> >> +static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
> >> + struct vring_desc **descs, int *desc_max)
> >
> > Not sure it should be cold like that - virtio net uses indirect on data
> > path.
>
> This is only when we have a chained, indirect descriptor (ie. we have to
> go back up to the next entry in the main descriptor table). That's
> allowed in the spec, but noone does it.
> >> + /* Make sure it's OK, and get offset. */
> >> + if (!check_range(desc.addr, desc.len, &range, getrange)) {
> >> + err = -EINVAL;
> >> + goto fail;
> >> + }
> >
> > Hmm this looks like it will translate and
> > validate immediate descriptors same way as indirect ones.
> > vhost-net has different translation for regular descriptors
> > and indirect ones, both for speed and to allow ring aliasing,
> > so it has to know which is which.
>
> I see translate_desc() in both cases, what's different?
> >> + addr = (void *)(long)desc.addr + range.offset;
> >
> > I really dislike raw pointers that we must never dereference.
> > Since we are forcing everything to __user anyway, why don't we
> > tag all addresses as __user? The kernel users of this API
> > can cast that away, this will keep the casts to minimum.
> >
> > Failing that, we can add our own class
> > # define __virtio __attribute__((noderef, address_space(2)))
>
> In this case, perhaps we should leave addr as a u64?
Point being? All users will cast to a pointer.
It seems at first passing in raw pointers is cleaner,
but it turns out in the API we are passing iovs around,
and they are __user anyway.
So using raw pointers here does not buy us anything,
so let's use __user and gain extra static checks at no cost.
> >> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >> + iov->iov[iov->i].iov_len = desc.len;
> >> + iov->i++;
> >
> >
> > This looks like it won't do the right thing if desc.len spans multiple
> > ranges. I don't know if this happens in practice but this is something
> > vhost supports ATM.
>
> Well, kind of. I assumed that the bool (*getrange)(u64, struct
> vringh_range *)) callback would meld any adjacent ranges if it needs to.
Confused. If addresses 0 to 0x1000 map to virtual addresses 0 to 0x1000
and 0x1000 to 0x2000 map to virtual addresses 0x2000 to 0x3000, then
a single descriptor covering 0 to 0x2000 in guest needs two
iov entries. What can getrange do about it?
> >> +/* All the information about an iovec. */
> >> +struct vringh_iov {
> >> + struct iovec *iov;
> >> + unsigned i, max;
> >> + bool allocated;
> >
> > MAybe set iov = NULL when not allocated?
>
> The idea was that iov points to the initial (on-stack?) iov, for the
> fast path.
>
> I'm writing a more complete test at the moment, then I will look at how
> this fits with vhost.c as it stands...
>
> Cheers,
> Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-16 8:16 ` Michael S. Tsirkin
@ 2013-01-17 2:10 ` Rusty Russell
[not found] ` <87k3rcy2y2.fsf@rustcorp.com.au>
1 sibling, 0 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-17 2:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
>> >> +{
>> >> + struct iovec *new;
>> >> + unsigned int new_num = iov->max * 2;
>> >
>> > We must limit this I think, this is coming
>> > from userspace. How about UIO_MAXIOV?
>>
>> We limit it to the ring size already;
>
> 1. do we limit it in case there's a loop in the descriptor ring?
Yes, we catch loops as per normal (simple counter):
if (count++ == vrh->vring.num) {
vringh_bad("Descriptor loop in %p", descs);
err = -ELOOP;
goto fail;
}
> 2. do we limit it in case there are indirect descriptors?
> I guess I missed where we do this could you point this out to me?
Well, the total is limited above, indirect descriptors or no (since we
handle them inline). Because each indirect descriptor must contain one
descriptor (we always grab descriptor 0), the loop must terminate.
>> UIO_MAXIOV is a weird choice here.
>
> It's kind of forced by the need to pass the iov on to the linux kernel,
> so we know that any guest using more is broken on existing hypervisors.
>
> Ring size is somewhat arbitrary too, isn't it? A huge ring where we
> post lots of short descriptors (e.g. RX buffers) seems like a valid thing to do.
Sure, but the ring size is a documented limit (even if indirect
descriptors are used). I hadn't realized we have an
implementation-specific limit of 1024 descriptors: I shall add this.
While noone reasonable will exceed that, we should document it somewhere
in the spec.
>> > I really dislike raw pointers that we must never dereference.
>> > Since we are forcing everything to __user anyway, why don't we
>> > tag all addresses as __user? The kernel users of this API
>> > can cast that away, this will keep the casts to minimum.
>> >
>> > Failing that, we can add our own class
>> > # define __virtio __attribute__((noderef, address_space(2)))
>>
>> In this case, perhaps we should leave addr as a u64?
>
> Point being? All users will cast to a pointer.
> It seems at first passing in raw pointers is cleaner,
> but it turns out in the API we are passing iovs around,
> and they are __user anyway.
> So using raw pointers here does not buy us anything,
> so let's use __user and gain extra static checks at no cost.
I resist sprinkling __user everywhere because it's *not* always user
addresses, and it's deeply misleading to anyone reading it. I'd rather
have it in one place with a big comment.
I can try using a union of kvec and iovec, since they are the same
layout in practice AFAICT.
>> >> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
>> >> + iov->iov[iov->i].iov_len = desc.len;
>> >> + iov->i++;
>> >
>> >
>> > This looks like it won't do the right thing if desc.len spans multiple
>> > ranges. I don't know if this happens in practice but this is something
>> > vhost supports ATM.
>>
>> Well, kind of. I assumed that the bool (*getrange)(u64, struct
>> vringh_range *)) callback would meld any adjacent ranges if it needs to.
>
> Confused. If addresses 0 to 0x1000 map to virtual addresses 0 to 0x1000
> and 0x1000 to 0x2000 map to virtual addresses 0x2000 to 0x3000, then
> a single descriptor covering 0 to 0x2000 in guest needs two
> iov entries. What can getrange do about it?
getrange doesn't map virtual to physical, it maps virtual to user.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
[not found] ` <87k3rcy2y2.fsf@rustcorp.com.au>
@ 2013-01-17 9:58 ` Michael S. Tsirkin
2013-01-21 11:55 ` Rusty Russell
2013-01-17 10:35 ` Rusty Russell
1 sibling, 1 reply; 51+ messages in thread
From: Michael S. Tsirkin @ 2013-01-17 9:58 UTC (permalink / raw)
To: Rusty Russell; +Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
On Thu, Jan 17, 2013 at 12:40:29PM +1030, Rusty Russell wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> > On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> writes:
> >> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
> >> >> +{
> >> >> + struct iovec *new;
> >> >> + unsigned int new_num = iov->max * 2;
> >> >
> >> > We must limit this I think, this is coming
> >> > from userspace. How about UIO_MAXIOV?
> >>
> >> We limit it to the ring size already;
> >
> > 1. do we limit it in case there's a loop in the descriptor ring?
>
> Yes, we catch loops as per normal (simple counter):
>
> if (count++ == vrh->vring.num) {
> vringh_bad("Descriptor loop in %p", descs);
> err = -ELOOP;
> goto fail;
> }
>
> > 2. do we limit it in case there are indirect descriptors?
> > I guess I missed where we do this could you point this out to me?
>
> Well, the total is limited above, indirect descriptors or no (since we
> handle them inline). Because each indirect descriptor must contain one
> descriptor (we always grab descriptor 0), the loop must terminate.
>
> >> UIO_MAXIOV is a weird choice here.
> >
> > It's kind of forced by the need to pass the iov on to the linux kernel,
> > so we know that any guest using more is broken on existing hypervisors.
> >
> > Ring size is somewhat arbitrary too, isn't it? A huge ring where we
> > post lots of short descriptors (e.g. RX buffers) seems like a valid thing to do.
>
> Sure, but the ring size is a documented limit (even if indirect
> descriptors are used). I hadn't realized we have an
> implementation-specific limit of 1024 descriptors: I shall add this.
> While noone reasonable will exceed that, we should document it somewhere
> in the spec.
>
> >> > I really dislike raw pointers that we must never dereference.
> >> > Since we are forcing everything to __user anyway, why don't we
> >> > tag all addresses as __user? The kernel users of this API
> >> > can cast that away, this will keep the casts to minimum.
> >> >
> >> > Failing that, we can add our own class
> >> > # define __virtio __attribute__((noderef, address_space(2)))
> >>
> >> In this case, perhaps we should leave addr as a u64?
> >
> > Point being? All users will cast to a pointer.
> > It seems at first passing in raw pointers is cleaner,
> > but it turns out in the API we are passing iovs around,
> > and they are __user anyway.
> > So using raw pointers here does not buy us anything,
> > so let's use __user and gain extra static checks at no cost.
>
> I resist sprinkling __user everywhere because it's *not* always user
> addresses, and it's deeply misleading to anyone reading it. I'd rather
> have it in one place with a big comment.
> I can try using a union of kvec and iovec, since they are the same
> layout in practice AFAICT.
I suggest the following easy fix: as you say, it's
in one place with a bug comment.
/* On the host side we often communicate to untrusted
* entities over virtio, so set __user tag on addresses
* we get helps make sure we don't directly dereference the addresses,
* while making it possible to pass the addresses in iovec arrays
* without casts.
*/
#define __virtio __user
/* A helper to discard __virtio tag - only call when
* you are communicating to a trusted entity.
*/
static inline void *virtio_raw_addr(__virtio void *addr)
{
return (__force void *)addr;
}
Hmm?
>
> >> >> + iov->iov[iov->i].iov_base = (__force __user void *)addr;
> >> >> + iov->iov[iov->i].iov_len = desc.len;
> >> >> + iov->i++;
> >> >
> >> >
> >> > This looks like it won't do the right thing if desc.len spans multiple
> >> > ranges. I don't know if this happens in practice but this is something
> >> > vhost supports ATM.
> >>
> >> Well, kind of. I assumed that the bool (*getrange)(u64, struct
> >> vringh_range *)) callback would meld any adjacent ranges if it needs to.
> >
> > Confused. If addresses 0 to 0x1000 map to virtual addresses 0 to 0x1000
> > and 0x1000 to 0x2000 map to virtual addresses 0x2000 to 0x3000, then
> > a single descriptor covering 0 to 0x2000 in guest needs two
> > iov entries. What can getrange do about it?
>
> getrange doesn't map virtual to physical, it maps virtual to user.
>
> Cheers,
> Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
[not found] ` <87k3rcy2y2.fsf@rustcorp.com.au>
2013-01-17 9:58 ` Michael S. Tsirkin
@ 2013-01-17 10:35 ` Rusty Russell
1 sibling, 0 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-17 10:35 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
Rusty Russell <rusty@rustcorp.com.au> writes:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> On Wed, Jan 16, 2013 at 01:43:32PM +1030, Rusty Russell wrote:
>>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>>> >> +static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
>>> >> +{
>>> >> + struct iovec *new;
>>> >> + unsigned int new_num = iov->max * 2;
>>> >
>>> > We must limit this I think, this is coming
>>> > from userspace. How about UIO_MAXIOV?
>>>
>>> We limit it to the ring size already;
>>
>> 1. do we limit it in case there's a loop in the descriptor ring?
I didn't get a chance to do these revisions, as I spent today debugging
the test framework. I won't get any more work on it until next week, so
I've posted a rough series anyway for feedback (can also be found
in my pending-rebases branch on kernel.org).
Thanks!
Rusty.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio.
2013-01-17 9:58 ` Michael S. Tsirkin
@ 2013-01-21 11:55 ` Rusty Russell
0 siblings, 0 replies; 51+ messages in thread
From: Rusty Russell @ 2013-01-21 11:55 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Linus Walleij, LKML, virtualization, Sjur Brændeland
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Thu, Jan 17, 2013 at 12:40:29PM +1030, Rusty Russell wrote:
>> I resist sprinkling __user everywhere because it's *not* always user
>> addresses, and it's deeply misleading to anyone reading it. I'd rather
>> have it in one place with a big comment.
>> I can try using a union of kvec and iovec, since they are the same
>> layout in practice AFAICT.
>
> I suggest the following easy fix: as you say, it's
> in one place with a bug comment.
>
> /* On the host side we often communicate to untrusted
> * entities over virtio, so set __user tag on addresses
> * we get helps make sure we don't directly dereference the addresses,
> * while making it possible to pass the addresses in iovec arrays
> * without casts.
> */
> #define __virtio __user
>
> /* A helper to discard __virtio tag - only call when
> * you are communicating to a trusted entity.
> */
> static inline void *virtio_raw_addr(__virtio void *addr)
> {
> return (__force void *)addr;
> }
>
> Hmm?
The two problems are iovec, which contains a __user address, and that
gets exposed via the API, and vring, which *doesn't* use __user
addresses.
This is ugly, but works:
vringh: use vringh_kiov for _kern functions, and internally.
This makes user of vringh perfectly __user-clean, and removes an
internal cast.
This only works because of -fno-strict-aliasing!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index ab10da8..2ba087d 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -65,9 +65,9 @@ static inline int __vringh_get_head(const struct vringh *vrh,
}
/* Copy some bytes to/from the iovec. Returns num copied. */
-static inline ssize_t vringh_iov_xfer(struct vringh_iov *iov,
+static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
void *ptr, size_t len,
- int (*xfer)(void __user *addr, void *ptr,
+ int (*xfer)(void *addr, void *ptr,
size_t len))
{
int err, done = 0;
@@ -149,9 +149,9 @@ static int move_to_indirect(int *up_next, u16 *i, void *addr,
return 0;
}
-static int resize_iovec(struct vringh_iov *iov, gfp_t gfp)
+static int resize_iovec(struct vringh_kiov *iov, gfp_t gfp)
{
- struct iovec *new;
+ struct kvec *new;
unsigned int new_num = iov->max * 2;
if (new_num < 8)
@@ -186,8 +186,8 @@ static u16 __cold return_from_indirect(const struct vringh *vrh, int *up_next,
static inline int
__vringh_iov(struct vringh *vrh, u16 i,
- struct vringh_iov *riov,
- struct vringh_iov *wiov,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
bool (*getrange)(u64 addr, struct vringh_range *r),
gfp_t gfp,
int (*getdesc)(struct vring_desc *dst, const struct vring_desc *s))
@@ -204,7 +204,7 @@ __vringh_iov(struct vringh *vrh, u16 i,
riov->i = wiov->i = 0;
for (;;) {
void *addr;
- struct vringh_iov *iov;
+ struct vringh_kiov *iov;
err = getdesc(&desc, &descs[i]);
if (unlikely(err))
@@ -249,7 +249,7 @@ __vringh_iov(struct vringh *vrh, u16 i,
goto fail;
}
- iov->iov[iov->i].iov_base = (__force __user void *)addr;
+ iov->iov[iov->i].iov_base = addr;
iov->iov[iov->i].iov_len = desc.len;
iov->i++;
@@ -438,7 +438,7 @@ static inline void __vringh_notify_disable(struct vringh *vrh,
}
}
-/* Userspace access helpers. */
+/* Userspace access helpers: in this case, addresses are really userspace. */
static inline int getu16_user(u16 *val, const u16 *p)
{
return get_user(*val, (__force u16 __user *)p);
@@ -452,27 +452,27 @@ static inline int putu16_user(u16 *p, u16 val)
static inline int getdesc_user(struct vring_desc *dst,
const struct vring_desc *src)
{
- return copy_from_user(dst, (__force void *)src, sizeof(*dst)) == 0 ? 0 :
- -EFAULT;
+ return copy_from_user(dst, (__force void __user *)src, sizeof(*dst)) ?
+ -EFAULT : 0;
}
static inline int putused_user(struct vring_used_elem *dst,
const struct vring_used_elem *s)
{
- return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) == 0
- ? 0 : -EFAULT;
+ return copy_to_user((__force void __user *)dst, s, sizeof(*dst)) ?
+ -EFAULT : 0;
}
static inline int xfer_from_user(void *src, void *dst, size_t len)
{
- return copy_from_user(dst, (__force void *)src, len) == 0 ? 0 :
- -EFAULT;
+ return copy_from_user(dst, (__force void __user *)src, len) ?
+ -EFAULT : 0;
}
static inline int xfer_to_user(void *dst, void *src, size_t len)
{
- return copy_to_user((__force void *)dst, src, len) == 0 ? 0 :
- -EFAULT;
+ return copy_to_user((__force void __user *)dst, src, len) ?
+ -EFAULT : 0;
}
/**
@@ -506,6 +506,7 @@ int vringh_init_user(struct vringh *vrh, u32 features,
vrh->last_avail_idx = 0;
vrh->last_used_idx = 0;
vrh->vring.num = num;
+ /* vring expects kernel addresses, but only used via accessors. */
vrh->vring.desc = (__force struct vring_desc *)desc;
vrh->vring.avail = (__force struct vring_avail *)avail;
vrh->vring.used = (__force struct vring_used *)used;
@@ -543,8 +544,30 @@ int vringh_getdesc_user(struct vringh *vrh,
if (err == vrh->vring.num)
return 0;
+ /* We need the layouts to be the indentical for this to work */
+ BUILD_BUG_ON(sizeof(struct vringh_kiov) != sizeof(struct vringh_iov));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, iov) !=
+ offsetof(struct vringh_iov, iov));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, i) !=
+ offsetof(struct vringh_iov, i));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, max) !=
+ offsetof(struct vringh_iov, max));
+ BUILD_BUG_ON(offsetof(struct vringh_kiov, allocated) !=
+ offsetof(struct vringh_iov, allocated));
+ BUILD_BUG_ON(sizeof(struct iovec) != sizeof(struct kvec));
+ BUILD_BUG_ON(offsetof(struct iovec, iov_base) !=
+ offsetof(struct kvec, iov_base));
+ BUILD_BUG_ON(offsetof(struct iovec, iov_len) !=
+ offsetof(struct kvec, iov_len));
+ BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_base)
+ != sizeof(((struct kvec *)NULL)->iov_base));
+ BUILD_BUG_ON(sizeof(((struct iovec *)NULL)->iov_len)
+ != sizeof(((struct kvec *)NULL)->iov_len));
+
*head = err;
- err = __vringh_iov(vrh, *head, riov, wiov, getrange, gfp, getdesc_user);
+ err = __vringh_iov(vrh, *head, (struct vringh_kiov *)riov,
+ (struct vringh_kiov *)wiov,
+ getrange, gfp, getdesc_user);
if (err)
return err;
@@ -561,7 +584,8 @@ int vringh_getdesc_user(struct vringh *vrh,
*/
ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
{
- return vringh_iov_xfer(riov, dst, len, xfer_from_user);
+ return vringh_iov_xfer((struct vringh_kiov *)riov,
+ dst, len, xfer_from_user);
}
/**
@@ -575,7 +599,8 @@ ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
const void *src, size_t len)
{
- return vringh_iov_xfer(wiov, (void *)src, len, xfer_to_user);
+ return vringh_iov_xfer((struct vringh_kiov *)wiov,
+ (void *)src, len, xfer_to_user);
}
/**
@@ -734,8 +759,8 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
* have to kfree riov->iov and wiov->iov respectively.
*/
int vringh_getdesc_kern(struct vringh *vrh,
- struct vringh_iov *riov,
- struct vringh_iov *wiov,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
u16 *head,
gfp_t gfp)
{
@@ -766,7 +791,7 @@ int vringh_getdesc_kern(struct vringh *vrh,
*
* Returns the bytes copied <= len or a negative errno.
*/
-ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len)
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len)
{
return vringh_iov_xfer(riov, dst, len, xfer_kern);
}
@@ -779,7 +804,7 @@ ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len)
*
* Returns the bytes copied <= len or a negative errno.
*/
-ssize_t vringh_iov_push_kern(struct vringh_iov *wiov,
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
const void *src, size_t len)
{
return vringh_iov_xfer(wiov, (void *)src, len, xfer_kern);
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 9df86e9..b3345e9 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -24,7 +24,7 @@
#ifndef _LINUX_VRINGH_H
#define _LINUX_VRINGH_H
#include <uapi/linux/virtio_ring.h>
-#include <uapi/linux/uio.h>
+#include <linux/uio.h>
#include <asm/barrier.h>
/* virtio_ring with information needed for host access. */
@@ -64,6 +64,13 @@ struct vringh_iov {
bool allocated;
};
+/* All the information about a kvec. */
+struct vringh_kiov {
+ struct kvec *iov;
+ unsigned i, max;
+ bool allocated;
+};
+
/* Helpers for userspace vrings. */
int vringh_init_user(struct vringh *vrh, u32 features,
unsigned int num, bool weak_barriers,
@@ -103,13 +110,13 @@ int vringh_init_kern(struct vringh *vrh, u32 features,
struct vring_used *used);
int vringh_getdesc_kern(struct vringh *vrh,
- struct vringh_iov *riov,
- struct vringh_iov *wiov,
+ struct vringh_kiov *riov,
+ struct vringh_kiov *wiov,
u16 *head,
gfp_t gfp);
-ssize_t vringh_iov_pull_kern(struct vringh_iov *riov, void *dst, size_t len);
-ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
+ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len);
+ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
const void *src, size_t len);
void vringh_abandon_kern(struct vringh *vrh, unsigned int num);
int vringh_complete_kern(struct vringh *vrh, u16 head, u32 len);
^ permalink raw reply related [flat|nested] 51+ messages in thread
end of thread, other threads:[~2013-01-21 11:55 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-31 22:46 [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 1/4] virtio: Move definitions to header file vring.h Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 2/4] include/vring.h: Add support for reversed vritio rings Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 3/4] virtio_ring: Call callback function even when used ring is empty Sjur Brændeland
2012-10-31 22:46 ` [RFC virtio-next 4/4] caif_virtio: Add CAIF over virtio Sjur Brændeland
2012-11-01 7:41 ` [RFC virtio-next 0/4] Introduce CAIF Virtio and reversed Vrings Rusty Russell
2012-11-05 12:12 ` Sjur Brændeland
[not found] ` <CANHm3PgrsTD4uYuXN0AMuZFX794CJmmus4AST=G0+nP1ha3VyQ@mail.gmail.com>
2012-11-06 2:09 ` Rusty Russell
2012-12-05 14:36 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Sjur Brændeland
2012-12-05 14:36 ` [RFCv2 01/12] vhost: Use struct vring in vhost_virtqueue Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 02/12] vhost: Isolate reusable vring related functions Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 03/12] virtio-ring: Introduce file virtio_ring_host Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 04/12] virtio-ring: Refactor out the functions accessing user memory Sjur Brændeland
2012-12-06 9:52 ` Michael S. Tsirkin
2012-12-06 11:03 ` Sjur BRENDELAND
2012-12-06 11:15 ` Michael S. Tsirkin
2012-12-07 11:05 ` Sjur BRENDELAND
2012-12-07 12:40 ` Michael S. Tsirkin
2012-12-07 13:02 ` Sjur BRENDELAND
2012-12-07 14:05 ` Michael S. Tsirkin
2012-12-05 14:37 ` [RFCv2 05/12] virtio-ring: Refactor move attributes to struct virtqueue Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 06/12] virtio_ring: Move SMP macros to virtio_ring.h Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 07/12] virtio-ring: Add Host side virtio-ring implementation Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 08/12] virtio: Update vring_interrupt for host-side virtio queues Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 09/12] virtio-ring: Add BUG_ON checking on host/guest ring type Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 10/12] virtio: Add argument reversed to function find_vqs() Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 11/12] remoteproc: Add support for host-virtqueues Sjur Brændeland
2012-12-05 14:37 ` [RFCv2 12/12] caif_virtio: Introduce caif over virtio Sjur Brændeland
2012-12-06 10:27 ` [RFCv2 00/12] Introduce host-side virtio queue and CAIF Virtio Michael S. Tsirkin
2012-12-21 6:11 ` Rusty Russell
2013-01-08 8:04 ` Sjur Brændeland
2013-01-08 23:17 ` Rusty Russell
2013-01-10 10:30 ` Rusty Russell
2013-01-10 11:11 ` Michael S. Tsirkin
2013-01-10 22:48 ` Rusty Russell
2013-01-11 7:31 ` Michael S. Tsirkin
[not found] ` <20130111073155.GA13315@redhat.com>
2013-01-12 0:20 ` Rusty Russell
2013-01-14 16:54 ` Michael S. Tsirkin
2013-01-10 18:39 ` Sjur Brændeland
2013-01-10 23:35 ` Rusty Russell
2013-01-11 6:37 ` Rusty Russell
2013-01-11 15:02 ` Sjur Brændeland
2013-01-12 0:26 ` Rusty Russell
2013-01-14 17:39 ` Michael S. Tsirkin
2013-01-16 3:13 ` Rusty Russell
2013-01-16 8:16 ` Michael S. Tsirkin
2013-01-17 2:10 ` Rusty Russell
[not found] ` <87k3rcy2y2.fsf@rustcorp.com.au>
2013-01-17 9:58 ` Michael S. Tsirkin
2013-01-21 11:55 ` Rusty Russell
2013-01-17 10:35 ` Rusty Russell
2013-01-11 14:52 ` Sjur Brændeland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).