qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends
@ 2014-03-04 18:22 Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 01/20] Convert -mem-path to QemuOpts and add share property Antonios Motakis
                   ` (20 more replies)
  0 siblings, 21 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel; +Cc: lukego, Antonios Motakis, tech, n.nikolaev, mst

In this patch series we would like to introduce our approach for putting a
virtio-net backend in an external userspace process. Our eventual target is to
run the network backend in the Snabbswitch ethernet switch, while receiving
traffic from a guest inside QEMU/KVM which runs an unmodified virtio-net
implementation.

For this, we are working into extending vhost to allow equivalent functionality
for userspace. Vhost already passes control of the data plane of virtio-net to
the host kernel; we want to realize a similar model, but for userspace.

In this patch series the concept of a vhost-backend is introduced.

We define two vhost backend types - vhost-kernel and vhost-user. The former is
the interface to the current kernel module implementation. Its control plane is
ioctl based. The data plane is realized by the kernel directly accessing the
QEMU allocated, guest memory.

In the new vhost-user backend, the control plane is based on communication
between QEMU and another userspace process using a unix domain socket. This
allows to implement a virtio backend for a guest running in QEMU, inside the
other userspace process. For this communication we use a chardev with a Unix
domain socket backend. Vhost-user is client/server agnostic regarding the
chardev, however it does not support the 'nowait' and 'telnet' options.

A reconnection in 'server' mode is supported, but the backend's exposed virtio
features need to be compatible with the first connected slave.

We change -mem-path to QemuOpts and add prealloc and share as properties
to it. HugeTLBFS is required for this option to work.

The data path is realized by directly accessing the vrings and the buffer data
off the guest's memory.

The current user of vhost-user is only vhost-net. We add new netdev backend
that is intended to initialize vhost-net with vhost-user backend.

Example usage:

qemu -m 1024 -mem-path /hugetlbfs,share=on \
     -chardev socket,id=chr0,path=/path/to/
socket \
     -netdev type=vhost-user,id=net0,chardev=chr0 \
     -device virtio-net-pci,netdev=net0

On non-MSIX guests the vhost feture can be forced using a special option:

...
     -netdev type=vhost-user,id=net0,chardev=chr0,vhostforce
...

In order to use ioeventfds, kvm should be enabled.

This code can be pulled from git@github.com:virtualopensystems/qemu.git vhost-user-v9
A simple functional test is available in tests/vhost-user-test.c

A reference vhost-user slave for testing is also available from git@github.com:virtualopensystems/vapp.git

Changes from v8:
 - Removed prealloc property from the -mem-path refactoring
 - Added and use new function - kvm_eventfds_enabled
 - Add virtio_queue_get_avail_idx used in vhost_virtqueue_stop to
   get a sane value in case of VHOST_GET_VRING_BASE failure
 - vhost user uses kvm_eventfds_enabled to check whether the ioeventfd
   capability of KVM is available
 - Added flag VHOST_USER_VRING_NOFD_MASK to be set when KICK, CALL or ERR file
   descriptor is invalid or ioeventfd is not available

Changes from v7:
 - Slave reconnection when using chardev in server mode
 - qtest vhost-user-test added
 - New qemu_chr_fe_get_msgfds for reading multiple fds from the chardev
 - Mandatory features in vhost_dev, used on reconnect to verify for conflicts
 - Add vhostforce parameter to -netdev vhost-user (for non-MSIX guests)
 - Extend libqemustub.a to support qemu-char.c

Changes from v6:
 - Remove the 'unlink' property of '-mem-path'
 - Extend qemu-char: blocking read, send fds, monitor for connection close
 - Vhost-user uses chardev as a backend
 - Poll and reconnect removed (no VHOST_USER_ECHO).
 - Disconnect is deteced by the chardev (G_IO_HUP event)
 - vhost-backend.c split to vhost-user.c

Changes from v5:
 - Split -mem-path unlink option to a separate patch
 - Fds are passed only in the ancillary data
 - Stricter message size checks on receive/send
 - Netdev vhost-user now includes path and poll_time options
 - The connection probing interval is configurable

Changes from v4:
 - Use error_report for errors
 - VhostUserMsg has new field `size` indicating the following payload length.
   Field `flags` now has version and reply bits. The structure is packed.
 - Send data is of variable length (`size` field in message)
 - Receive in 2 steps, header and payload
 - Add new message type VHOST_USER_ECHO, to check connection status

Changes from v3:
 - Convert -mem-path to QemuOpts with prealloc, share and unlink properties
 - Set 1 sec timeout when read/write to the unix domain socket
 - Fix file descriptor leak

Changes from v2:
 - Reconnect when the backend disappears

Changes from v1:
 - Implementation of vhost-user netdev backend
 - Code improvements

Antonios Motakis (20):
  Convert -mem-path to QemuOpts and add share property
  Add kvm_eventfds_enabled function
  Add chardev API qemu_chr_fe_read_all
  Add chardev API qemu_chr_fe_set_msgfds
  Add chardev API qemu_chr_fe_get_msgfds
  Add G_IO_HUP handler for socket chardev
  vhost_net should call the poll callback only when it is set
  Refactor virtio-net to use generic get_vhost_net
  Add new virtio API virtio_queue_get_avail_idx
  Gracefully handle ioctl failure in vhost_virtqueue_stop
  vhost_net_init will use VhostNetOptions to get all its arguments
  Add vhost_ops to vhost_dev struct and replace all relevant ioctls
  Add mandatory_features to vhost_dev
  Add vhost-backend and VhostBackendType
  Add vhost-user as a vhost backend.
  Add new vhost-user netdev backend
  Add the vhost-user netdev backend to the command line
  Add vhost-user protocol documentation
  libqemustub: add stubs to be able to use qemu-char.c
  Add qtest for vhost-user

 docs/specs/vhost-user.txt         | 261 ++++++++++++++++++++++++++++
 exec.c                            |  21 ++-
 hmp-commands.hx                   |   4 +-
 hw/net/vhost_net.c                | 141 ++++++++++-----
 hw/net/virtio-net.c               |  29 +---
 hw/scsi/vhost-scsi.c              |  20 ++-
 hw/virtio/Makefile.objs           |   2 +-
 hw/virtio/vhost-backend.c         |  71 ++++++++
 hw/virtio/vhost-user.c            | 356 ++++++++++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  58 +++----
 hw/virtio/virtio.c                |   5 +
 include/exec/cpu-all.h            |   1 -
 include/hw/virtio/vhost-backend.h |  38 ++++
 include/hw/virtio/vhost.h         |   9 +-
 include/hw/virtio/virtio.h        |   1 +
 include/net/vhost-user.h          |  17 ++
 include/net/vhost_net.h           |  13 +-
 include/sysemu/char.h             |  43 ++++-
 include/sysemu/kvm.h              |  11 ++
 kvm-all.c                         |   4 +
 kvm-stub.c                        |   1 +
 net/Makefile.objs                 |   2 +-
 net/clients.h                     |   3 +
 net/hub.c                         |   1 +
 net/net.c                         |  25 +--
 net/tap.c                         |  18 +-
 net/vhost-user.c                  | 273 +++++++++++++++++++++++++++++
 qapi-schema.json                  |  19 +-
 qemu-char.c                       | 272 +++++++++++++++++++++++++----
 qemu-options.hx                   |  25 ++-
 stubs/Makefile.objs               |   8 +
 stubs/bdrv-commit-all.c           |   7 +
 stubs/chr-msmouse.c               |   7 +
 stubs/get-next-serial.c           |   3 +
 stubs/is-daemonized.c             |   7 +
 stubs/machine-init-done.c         |   6 +
 stubs/monitor-init.c              |   6 +
 stubs/notify-event.c              |   6 +
 stubs/vc-init.c                   |   7 +
 tests/Makefile                    |   4 +
 tests/vhost-user-test.c           | 309 +++++++++++++++++++++++++++++++++
 vl.c                              |  23 ++-
 42 files changed, 1979 insertions(+), 158 deletions(-)
 create mode 100644 docs/specs/vhost-user.txt
 create mode 100644 hw/virtio/vhost-backend.c
 create mode 100644 hw/virtio/vhost-user.c
 create mode 100644 include/hw/virtio/vhost-backend.h
 create mode 100644 include/net/vhost-user.h
 create mode 100644 net/vhost-user.c
 create mode 100644 stubs/bdrv-commit-all.c
 create mode 100644 stubs/chr-msmouse.c
 create mode 100644 stubs/get-next-serial.c
 create mode 100644 stubs/is-daemonized.c
 create mode 100644 stubs/machine-init-done.c
 create mode 100644 stubs/monitor-init.c
 create mode 100644 stubs/notify-event.c
 create mode 100644 stubs/vc-init.c
 create mode 100644 tests/vhost-user-test.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 01/20] Convert -mem-path to QemuOpts and add share property
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 02/20] Add kvm_eventfds_enabled function Antonios Motakis
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Peter Maydell, Corey Bryant, Stefan Hajnoczi, mst, Juan Quintela,
	Michael Tokarev, Alexander Graf, n.nikolaev, Anthony Liguori,
	Paolo Bonzini, lukego, Antonios Motakis, tech,
	Andreas Färber, Richard Henderson

Extend -mem-path with additional property:

 - share=on|off - default off, memory is mmapped with MAP_SHARED flag

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 exec.c                 | 21 +++++++++++++++++++--
 include/exec/cpu-all.h |  1 -
 qemu-options.hx        |  8 ++++++--
 vl.c                   | 23 +++++++++++++++++++++--
 4 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/exec.c b/exec.c
index b69fd29..4fb5cc6 100644
--- a/exec.c
+++ b/exec.c
@@ -1027,7 +1027,10 @@ static void *file_ram_alloc(RAMBlock *block,
     char *c;
     void *area;
     int fd;
+    int flags;
     unsigned long hpagesize;
+    QemuOpts *opts;
+    unsigned int mem_share = 0;
 
     hpagesize = gethugepagesize(path);
     if (!hpagesize) {
@@ -1043,6 +1046,12 @@ static void *file_ram_alloc(RAMBlock *block,
         return NULL;
     }
 
+    /* Fill config options */
+    opts = qemu_opts_find(qemu_find_opts("mem-path"), NULL);
+    if (opts) {
+        mem_share = qemu_opt_get_bool(opts, "share", 0);
+    }
+
     /* Make name safe to use with mkstemp by replacing '/' with '_'. */
     sanitized_name = g_strdup(block->mr->name);
     for (c = sanitized_name; *c != '\0'; c++) {
@@ -1063,7 +1072,7 @@ static void *file_ram_alloc(RAMBlock *block,
     unlink(filename);
     g_free(filename);
 
-    memory = (memory+hpagesize-1) & ~(hpagesize-1);
+    memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
 
     /*
      * ftruncate is not supported by hugetlbfs in older
@@ -1074,7 +1083,8 @@ static void *file_ram_alloc(RAMBlock *block,
     if (ftruncate(fd, memory))
         perror("ftruncate");
 
-    area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+    flags = mem_share ? MAP_SHARED : MAP_PRIVATE;
+    area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
     if (area == MAP_FAILED) {
         perror("file_ram_alloc: can't mmap RAM pages");
         close(fd);
@@ -1244,6 +1254,8 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
                                    MemoryRegion *mr)
 {
     RAMBlock *block, *new_block;
+    QemuOpts *opts;
+    const char *mem_path = 0;
     ram_addr_t old_ram_size, new_ram_size;
 
     old_ram_size = last_ram_offset() >> TARGET_PAGE_BITS;
@@ -1252,6 +1264,11 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
     new_block = g_malloc0(sizeof(*new_block));
     new_block->fd = -1;
 
+    opts = qemu_opts_find(qemu_find_opts("mem-path"), NULL);
+    if (opts) {
+        mem_path = qemu_opt_get(opts, "path");
+    }
+
     /* This assumes the iothread lock is taken here too.  */
     qemu_mutex_lock_ramlist();
     new_block->mr = mr;
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 4cb4b4a..afec798 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -468,7 +468,6 @@ typedef struct RAMList {
 } RAMList;
 extern RAMList ram_list;
 
-extern const char *mem_path;
 extern int mem_prealloc;
 
 /* Flags stored in the low bits of the TLB virtual address.  These are
diff --git a/qemu-options.hx b/qemu-options.hx
index 56e5fdf..f9f42a0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -221,9 +221,13 @@ gigabytes respectively.
 ETEXI
 
 DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath,
-    "-mem-path FILE  provide backing storage for guest RAM\n", QEMU_ARCH_ALL)
+    "-mem-path [path=]path[,share=on|off]\n"
+    "                provide backing storage for guest RAM\n"
+    "                path= a directory path for the backing store\n"
+    "                share= enable mmap share flag [default disabled]\n",
+        QEMU_ARCH_ALL)
 STEXI
-@item -mem-path @var{path}
+@item -mem-path [path=]@var{path}[,share=on|off]
 @findex -mem-path
 Allocate guest RAM from a temporarily created file in @var{path}.
 ETEXI
diff --git a/vl.c b/vl.c
index 1d27b34..6445457 100644
--- a/vl.c
+++ b/vl.c
@@ -135,7 +135,6 @@ DisplayType display_type = DT_DEFAULT;
 static int display_remote;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
-const char *mem_path = NULL;
 int mem_prealloc = 0; /* force preallocation of physical target memory */
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
@@ -479,6 +478,23 @@ static QemuOptsList qemu_msg_opts = {
     },
 };
 
+static QemuOptsList qemu_mem_path_opts = {
+    .name = "mem-path",
+    .implied_opt_name = "path",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_mem_path_opts.head),
+    .desc = {
+        {
+            .name = "path",
+            .type = QEMU_OPT_STRING,
+        },
+        {
+            .name = "share",
+            .type = QEMU_OPT_BOOL,
+        },
+        { /* end of list */ }
+    },
+};
+
 /**
  * Get machine options
  *
@@ -2863,6 +2879,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_tpmdev_opts);
     qemu_add_opts(&qemu_realtime_opts);
     qemu_add_opts(&qemu_msg_opts);
+    qemu_add_opts(&qemu_mem_path_opts);
 
     runstate_init();
 
@@ -3190,7 +3207,9 @@ int main(int argc, char **argv, char **envp)
                 break;
 #endif
             case QEMU_OPTION_mempath:
-                mem_path = optarg;
+                if (!qemu_opts_parse(qemu_find_opts("mem-path"), optarg, 1)) {
+                    exit(1);
+                }
                 break;
             case QEMU_OPTION_mem_prealloc:
                 mem_prealloc = 1;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 02/20] Add kvm_eventfds_enabled function
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 01/20] Convert -mem-path to QemuOpts and add share property Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 03/20] Add chardev API qemu_chr_fe_read_all Antonios Motakis
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Peter Maydell, Eduardo Habkost, Gleb Natapov, mst, n.nikolaev,
	open list:Overall, Paolo Bonzini, lukego, Antonios Motakis, tech,
	Andreas Färber, Richard Henderson

Add a function to check if the eventfd capability is present in KVM in
the host kernel.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 include/sysemu/kvm.h | 11 +++++++++++
 kvm-all.c            |  4 ++++
 kvm-stub.c           |  1 +
 3 files changed, 16 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a02d67c..3db063e 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -43,6 +43,7 @@ extern bool kvm_allowed;
 extern bool kvm_kernel_irqchip;
 extern bool kvm_async_interrupts_allowed;
 extern bool kvm_halt_in_kernel_allowed;
+extern bool kvm_eventfds_allowed;
 extern bool kvm_irqfds_allowed;
 extern bool kvm_msi_via_irqfd_allowed;
 extern bool kvm_gsi_routing_allowed;
@@ -83,6 +84,15 @@ extern bool kvm_readonly_mem_allowed;
 #define kvm_halt_in_kernel() (kvm_halt_in_kernel_allowed)
 
 /**
+ * kvm_eventfds_enabled:
+ *
+ * Returns: true if we can use eventfds to receive notifications
+ * from a KVM CPU (ie the kernel supports eventds and we are running
+ * with a configuration where it is meaningful to use them).
+ */
+#define kvm_eventfds_enabled() (kvm_eventfds_allowed)
+
+/**
  * kvm_irqfds_enabled:
  *
  * Returns: true if we can use irqfds to inject interrupts into
@@ -128,6 +138,7 @@ extern bool kvm_readonly_mem_allowed;
 #define kvm_irqchip_in_kernel() (false)
 #define kvm_async_interrupts_enabled() (false)
 #define kvm_halt_in_kernel() (false)
+#define kvm_eventfds_enabled() (false)
 #define kvm_irqfds_enabled() (false)
 #define kvm_msi_via_irqfd_enabled() (false)
 #define kvm_gsi_routing_allowed() (false)
diff --git a/kvm-all.c b/kvm-all.c
index fd8157a..85f31b4 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -110,6 +110,7 @@ KVMState *kvm_state;
 bool kvm_kernel_irqchip;
 bool kvm_async_interrupts_allowed;
 bool kvm_halt_in_kernel_allowed;
+bool kvm_eventfds_allowed;
 bool kvm_irqfds_allowed;
 bool kvm_msi_via_irqfd_allowed;
 bool kvm_gsi_routing_allowed;
@@ -1505,6 +1506,9 @@ int kvm_init(void)
         (kvm_check_extension(s, KVM_CAP_READONLY_MEM) > 0);
 #endif
 
+    kvm_eventfds_allowed =
+        (kvm_check_extension(s, KVM_CAP_IOEVENTFD) > 0);
+
     ret = kvm_arch_init(s);
     if (ret < 0) {
         goto err;
diff --git a/kvm-stub.c b/kvm-stub.c
index e979f76..a25cda2 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -22,6 +22,7 @@
 KVMState *kvm_state;
 bool kvm_kernel_irqchip;
 bool kvm_async_interrupts_allowed;
+bool kvm_eventfds_allowed;
 bool kvm_irqfds_allowed;
 bool kvm_msi_via_irqfd_allowed;
 bool kvm_gsi_routing_allowed;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 03/20] Add chardev API qemu_chr_fe_read_all
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 01/20] Convert -mem-path to QemuOpts and add share property Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 02/20] Add kvm_eventfds_enabled function Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 04/20] Add chardev API qemu_chr_fe_set_msgfds Antonios Motakis
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, Amit Shah, Michael Roth, n.nikolaev, Hans de Goede,
	Gerd Hoffmann, Anthony Liguori, lukego, Antonios Motakis, tech

This function will attempt to read data from the chardev trying
to fill the buffer up to the given length.
Add tcp_chr_disconnect to reuse disconnection code where needed.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 include/sysemu/char.h | 14 +++++++++
 qemu-char.c           | 83 ++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index b81a6ff..9981a6a 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -56,6 +56,8 @@ typedef void IOEventHandler(void *opaque, int event);
 struct CharDriverState {
     void (*init)(struct CharDriverState *s);
     int (*chr_write)(struct CharDriverState *s, const uint8_t *buf, int len);
+    int (*chr_sync_read)(struct CharDriverState *s,
+                         const uint8_t *buf, int len);
     GSource *(*chr_add_watch)(struct CharDriverState *s, GIOCondition cond);
     void (*chr_update_read_handler)(struct CharDriverState *s);
     int (*chr_ioctl)(struct CharDriverState *s, int cmd, void *arg);
@@ -189,6 +191,18 @@ int qemu_chr_fe_write(CharDriverState *s, const uint8_t *buf, int len);
 int qemu_chr_fe_write_all(CharDriverState *s, const uint8_t *buf, int len);
 
 /**
+ * @qemu_chr_fe_read_all:
+ *
+ * Read data to a buffer from the back end.
+ *
+ * @buf the data buffer
+ * @len the number of bytes to read
+ *
+ * Returns: the number of bytes read
+ */
+int qemu_chr_fe_read_all(CharDriverState *s, uint8_t *buf, int len);
+
+/**
  * @qemu_chr_fe_ioctl:
  *
  * Issue a device specific ioctl to a backend.
diff --git a/qemu-char.c b/qemu-char.c
index 4d50838..ff2e9d8 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -84,6 +84,7 @@
 #include "ui/qemu-spice.h"
 
 #define READ_BUF_LEN 4096
+#define READ_RETRIES 10
 
 /***********************************************************/
 /* character device */
@@ -145,6 +146,41 @@ int qemu_chr_fe_write_all(CharDriverState *s, const uint8_t *buf, int len)
     return offset;
 }
 
+int qemu_chr_fe_read_all(CharDriverState *s, uint8_t *buf, int len)
+{
+    int offset = 0, counter = 10;
+    int res;
+
+    if (!s->chr_sync_read) {
+        return 0;
+    }
+
+    while (offset < len) {
+        do {
+            res = s->chr_sync_read(s, buf + offset, len - offset);
+            if (res == -1 && errno == EAGAIN) {
+                g_usleep(100);
+            }
+        } while (res == -1 && errno == EAGAIN);
+
+        if (res == 0) {
+            break;
+        }
+
+        if (res < 0) {
+            return res;
+        }
+
+        offset += res;
+
+        if (!counter--) {
+            break;
+        }
+    }
+
+    return offset;
+}
+
 int qemu_chr_fe_ioctl(CharDriverState *s, int cmd, void *arg)
 {
     if (!s->chr_ioctl)
@@ -2453,6 +2489,23 @@ static GSource *tcp_chr_add_watch(CharDriverState *chr, GIOCondition cond)
     return g_io_create_watch(s->chan, cond);
 }
 
+static void tcp_chr_disconnect(CharDriverState *chr)
+{
+    TCPCharDriver *s = chr->opaque;
+
+    s->connected = 0;
+    if (s->listen_chan) {
+        s->listen_tag = g_io_add_watch(s->listen_chan, G_IO_IN,
+                                       tcp_chr_accept, chr);
+    }
+    remove_fd_in_watch(chr);
+    g_io_channel_unref(s->chan);
+    s->chan = NULL;
+    closesocket(s->fd);
+    s->fd = -1;
+    qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+}
+
 static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
 {
     CharDriverState *chr = opaque;
@@ -2469,16 +2522,7 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
     size = tcp_chr_recv(chr, (void *)buf, len);
     if (size == 0) {
         /* connection closed */
-        s->connected = 0;
-        if (s->listen_chan) {
-            s->listen_tag = g_io_add_watch(s->listen_chan, G_IO_IN, tcp_chr_accept, chr);
-        }
-        remove_fd_in_watch(chr);
-        g_io_channel_unref(s->chan);
-        s->chan = NULL;
-        closesocket(s->fd);
-        s->fd = -1;
-        qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+        tcp_chr_disconnect(chr);
     } else if (size > 0) {
         if (s->do_telnetopt)
             tcp_chr_process_IAC_bytes(chr, s, buf, &size);
@@ -2489,6 +2533,24 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
     return TRUE;
 }
 
+static int tcp_chr_sync_read(CharDriverState *chr, const uint8_t *buf, int len)
+{
+    TCPCharDriver *s = chr->opaque;
+    int size;
+
+    if (!s->connected) {
+        return 0;
+    }
+
+    size = tcp_chr_recv(chr, (void *) buf, len);
+    if (size == 0) {
+        /* connection closed */
+        tcp_chr_disconnect(chr);
+    }
+
+    return size;
+}
+
 #ifndef _WIN32
 CharDriverState *qemu_chr_open_eventfd(int eventfd)
 {
@@ -2660,6 +2722,7 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, bool do_nodelay,
 
     chr->opaque = s;
     chr->chr_write = tcp_chr_write;
+    chr->chr_sync_read = tcp_chr_sync_read;
     chr->chr_close = tcp_chr_close;
     chr->get_msgfd = tcp_get_msgfd;
     chr->chr_add_client = tcp_chr_add_client;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 04/20] Add chardev API qemu_chr_fe_set_msgfds
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (2 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 03/20] Add chardev API qemu_chr_fe_read_all Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 05/20] Add chardev API qemu_chr_fe_get_msgfds Antonios Motakis
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, Amit Shah, Michael Roth, n.nikolaev, Hans de Goede,
	Gerd Hoffmann, Anthony Liguori, lukego, Antonios Motakis, tech

This will set an array of file descriptors to the internal structures.
The next time a message is send the array will be send as ancillary
data. This feature works on the UNIX domain socket backend only.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 include/sysemu/char.h | 14 ++++++++
 qemu-char.c           | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 9981a6a..d99dcf6 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -62,6 +62,7 @@ struct CharDriverState {
     void (*chr_update_read_handler)(struct CharDriverState *s);
     int (*chr_ioctl)(struct CharDriverState *s, int cmd, void *arg);
     int (*get_msgfd)(struct CharDriverState *s);
+    int (*set_msgfds)(struct CharDriverState *s, int *fds, int num);
     int (*chr_add_client)(struct CharDriverState *chr, int fd);
     IOEventHandler *chr_event;
     IOCanReadHandler *chr_can_read;
@@ -229,6 +230,19 @@ int qemu_chr_fe_ioctl(CharDriverState *s, int cmd, void *arg);
 int qemu_chr_fe_get_msgfd(CharDriverState *s);
 
 /**
+ * @qemu_chr_fe_set_msgfds:
+ *
+ * For backends capable of fd passing, set an array of fds to be passed with
+ * the next send operation.
+ * A subsequent call to this function before calling a write function will
+ * result in overwriting the fd array with the new value without being send.
+ * Upon writing the message the fd array is freed.
+ *
+ * Returns: -1 if fd passing isn't supported.
+ */
+int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num);
+
+/**
  * @qemu_chr_fe_claim:
  *
  * Claim a backend before using it, should be called before calling
diff --git a/qemu-char.c b/qemu-char.c
index ff2e9d8..cef615c 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -207,6 +207,11 @@ int qemu_chr_fe_get_msgfd(CharDriverState *s)
     return s->get_msgfd ? s->get_msgfd(s) : -1;
 }
 
+int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)
+{
+    return s->set_msgfds ? s->set_msgfds(s, fds, num) : -1;
+}
+
 int qemu_chr_add_client(CharDriverState *s, int fd)
 {
     return s->chr_add_client ? s->chr_add_client(s, fd) : -1;
@@ -2332,15 +2337,71 @@ typedef struct {
     int do_nodelay;
     int is_unix;
     int msgfd;
+    int *write_msgfds;
+    int write_msgfds_num;
 } TCPCharDriver;
 
 static gboolean tcp_chr_accept(GIOChannel *chan, GIOCondition cond, void *opaque);
 
+#ifndef _WIN32
+static int unix_send_msgfds(CharDriverState *chr, const uint8_t *buf, int len)
+{
+    TCPCharDriver *s = chr->opaque;
+    struct msghdr msgh;
+    struct iovec iov;
+    int r;
+
+    size_t fd_size = s->write_msgfds_num * sizeof(int);
+    char control[CMSG_SPACE(fd_size)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    memset(control, 0, sizeof(control));
+
+    /* set the payload */
+    iov.iov_base = (uint8_t *) buf;
+    iov.iov_len = len;
+
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
+
+    cmsg = CMSG_FIRSTHDR(&msgh);
+
+    cmsg->cmsg_len = CMSG_LEN(fd_size);
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type = SCM_RIGHTS;
+    memcpy(CMSG_DATA(cmsg), s->write_msgfds, fd_size);
+
+    do {
+        r = sendmsg(s->fd, &msgh, 0);
+    } while (r < 0 && errno == EINTR);
+
+    /* free the written msgfds, no matter what */
+    if (s->write_msgfds_num) {
+        g_free(s->write_msgfds);
+        s->write_msgfds = 0;
+        s->write_msgfds_num = 0;
+    }
+
+    return r;
+}
+#endif
+
 static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
     TCPCharDriver *s = chr->opaque;
     if (s->connected) {
-        return io_channel_send(s->chan, buf, len);
+#ifndef _WIN32
+        if (s->is_unix && s->write_msgfds_num) {
+            return unix_send_msgfds(chr, buf, len);
+        } else
+#endif
+        {
+            return io_channel_send(s->chan, buf, len);
+        }
     } else {
         /* XXX: indicate an error ? */
         return len;
@@ -2415,6 +2476,25 @@ static int tcp_get_msgfd(CharDriverState *chr)
     return fd;
 }
 
+static int tcp_set_msgfds(CharDriverState *chr, int *fds, int num)
+{
+    TCPCharDriver *s = chr->opaque;
+
+    /* clear old pending fd array */
+    if (s->write_msgfds) {
+        g_free(s->write_msgfds);
+    }
+
+    if (num) {
+        s->write_msgfds = g_malloc(num * sizeof(int));
+        memcpy(s->write_msgfds, fds, num * sizeof(int));
+    }
+
+    s->write_msgfds_num = num;
+
+    return 0;
+}
+
 #ifndef _WIN32
 static void unix_process_msgfd(CharDriverState *chr, struct msghdr *msg)
 {
@@ -2665,6 +2745,9 @@ static void tcp_chr_close(CharDriverState *chr)
         }
         closesocket(s->listen_fd);
     }
+    if (s->write_msgfds_num) {
+        g_free(s->write_msgfds);
+    }
     g_free(s);
     qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
 }
@@ -2694,6 +2777,8 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, bool do_nodelay,
     s->fd = -1;
     s->listen_fd = -1;
     s->msgfd = -1;
+    s->write_msgfds = 0;
+    s->write_msgfds_num = 0;
 
     chr->filename = g_malloc(256);
     switch (ss.ss_family) {
@@ -2725,6 +2810,7 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, bool do_nodelay,
     chr->chr_sync_read = tcp_chr_sync_read;
     chr->chr_close = tcp_chr_close;
     chr->get_msgfd = tcp_get_msgfd;
+    chr->set_msgfds = tcp_set_msgfds;
     chr->chr_add_client = tcp_chr_add_client;
     chr->chr_add_watch = tcp_chr_add_watch;
     /* be isn't opened until we get a connection */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 05/20] Add chardev API qemu_chr_fe_get_msgfds
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (3 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 04/20] Add chardev API qemu_chr_fe_set_msgfds Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 06/20] Add G_IO_HUP handler for socket chardev Antonios Motakis
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, Amit Shah, Michael Roth, n.nikolaev, Hans de Goede,
	Gerd Hoffmann, Anthony Liguori, lukego, Antonios Motakis, tech

This extends the existing qemu_chr_fe_get_msgfd by allowing to read a set
of fds. The function for receiving the fds - unix_process_msgfd is extended
to allocate the needed array size.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 include/sysemu/char.h | 15 ++++++++-
 qemu-char.c           | 85 ++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 78 insertions(+), 22 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index d99dcf6..82eaaf5 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -61,7 +61,7 @@ struct CharDriverState {
     GSource *(*chr_add_watch)(struct CharDriverState *s, GIOCondition cond);
     void (*chr_update_read_handler)(struct CharDriverState *s);
     int (*chr_ioctl)(struct CharDriverState *s, int cmd, void *arg);
-    int (*get_msgfd)(struct CharDriverState *s);
+    int (*get_msgfds)(struct CharDriverState *s, int* fds, int num);
     int (*set_msgfds)(struct CharDriverState *s, int *fds, int num);
     int (*chr_add_client)(struct CharDriverState *chr, int fd);
     IOEventHandler *chr_event;
@@ -230,6 +230,19 @@ int qemu_chr_fe_ioctl(CharDriverState *s, int cmd, void *arg);
 int qemu_chr_fe_get_msgfd(CharDriverState *s);
 
 /**
+ * @qemu_chr_fe_get_msgfds:
+ *
+ * For backends capable of fd passing, return the number of file received
+ * descriptors and fills the fds array up to num elements
+ *
+ * Returns: -1 if fd passing isn't supported or there are no pending file
+ *          descriptors.  If file descriptors are returned, subsequent calls to
+ *          this function will return -1 until a client sends a new set of file
+ *          descriptors.
+ */
+int qemu_chr_fe_get_msgfds(CharDriverState *s, int *fds, int num);
+
+/**
  * @qemu_chr_fe_set_msgfds:
  *
  * For backends capable of fd passing, set an array of fds to be passed with
diff --git a/qemu-char.c b/qemu-char.c
index cef615c..f9bd047 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -204,7 +204,13 @@ void qemu_chr_be_write(CharDriverState *s, uint8_t *buf, int len)
 
 int qemu_chr_fe_get_msgfd(CharDriverState *s)
 {
-    return s->get_msgfd ? s->get_msgfd(s) : -1;
+    int fd;
+    return (qemu_chr_fe_get_msgfds(s, &fd, 1) >= 0) ? fd : -1;
+}
+
+int qemu_chr_fe_get_msgfds(CharDriverState *s, int *fds, int len)
+{
+    return s->get_msgfds ? s->get_msgfds(s, fds, len) : -1;
 }
 
 int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num)
@@ -2336,7 +2342,8 @@ typedef struct {
     int do_telnetopt;
     int do_nodelay;
     int is_unix;
-    int msgfd;
+    int *read_msgfds;
+    int read_msgfds_num;
     int *write_msgfds;
     int write_msgfds_num;
 } TCPCharDriver;
@@ -2468,12 +2475,20 @@ static void tcp_chr_process_IAC_bytes(CharDriverState *chr,
     *size = j;
 }
 
-static int tcp_get_msgfd(CharDriverState *chr)
+static int tcp_get_msgfds(CharDriverState *chr, int *fds, int num)
 {
     TCPCharDriver *s = chr->opaque;
-    int fd = s->msgfd;
-    s->msgfd = -1;
-    return fd;
+    int to_copy = (s->read_msgfds_num < num) ? s->read_msgfds_num : num;
+
+    if (to_copy) {
+        memcpy(fds, s->read_msgfds, to_copy * sizeof(int));
+
+        g_free(s->read_msgfds);
+        s->read_msgfds = 0;
+        s->read_msgfds_num = 0;
+    }
+
+    return to_copy;
 }
 
 static int tcp_set_msgfds(CharDriverState *chr, int *fds, int num)
@@ -2502,26 +2517,46 @@ static void unix_process_msgfd(CharDriverState *chr, struct msghdr *msg)
     struct cmsghdr *cmsg;
 
     for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) {
-        int fd;
+        int fd_size, i;
 
-        if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
+        if (cmsg->cmsg_len < CMSG_LEN(sizeof(int)) ||
             cmsg->cmsg_level != SOL_SOCKET ||
-            cmsg->cmsg_type != SCM_RIGHTS)
+            cmsg->cmsg_type != SCM_RIGHTS) {
             continue;
+        }
+
+        fd_size = cmsg->cmsg_len - CMSG_LEN(0);
 
-        fd = *((int *)CMSG_DATA(cmsg));
-        if (fd < 0)
+        if (!fd_size) {
             continue;
+        }
 
-        /* O_NONBLOCK is preserved across SCM_RIGHTS so reset it */
-        qemu_set_block(fd);
+        /* close and clean read_msgfds */
+        for (i = 0; i < s->read_msgfds_num; i++) {
+            close(s->read_msgfds[i]);
+        }
 
-#ifndef MSG_CMSG_CLOEXEC
-        qemu_set_cloexec(fd);
-#endif
-        if (s->msgfd != -1)
-            close(s->msgfd);
-        s->msgfd = fd;
+        if (s->read_msgfds_num) {
+            g_free(s->read_msgfds);
+        }
+
+        s->read_msgfds_num = fd_size / sizeof(int);
+        s->read_msgfds = g_malloc(fd_size);
+        memcpy(s->read_msgfds, CMSG_DATA(cmsg), fd_size);
+
+        for (i = 0; i < s->read_msgfds_num; i++) {
+            int fd = s->read_msgfds[i];
+            if (fd < 0) {
+                continue;
+            }
+
+            /* O_NONBLOCK is preserved across SCM_RIGHTS so reset it */
+            qemu_set_block(fd);
+
+    #ifndef MSG_CMSG_CLOEXEC
+            qemu_set_cloexec(fd);
+    #endif
+        }
     }
 }
 
@@ -2728,6 +2763,7 @@ static gboolean tcp_chr_accept(GIOChannel *channel, GIOCondition cond, void *opa
 static void tcp_chr_close(CharDriverState *chr)
 {
     TCPCharDriver *s = chr->opaque;
+    int i;
     if (s->fd >= 0) {
         remove_fd_in_watch(chr);
         if (s->chan) {
@@ -2745,6 +2781,12 @@ static void tcp_chr_close(CharDriverState *chr)
         }
         closesocket(s->listen_fd);
     }
+    if (s->read_msgfds_num) {
+        for (i = 0; i < s->read_msgfds_num; i++) {
+            close(s->read_msgfds[i]);
+        }
+        g_free(s->read_msgfds);
+    }
     if (s->write_msgfds_num) {
         g_free(s->write_msgfds);
     }
@@ -2776,7 +2818,8 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, bool do_nodelay,
     s->connected = 0;
     s->fd = -1;
     s->listen_fd = -1;
-    s->msgfd = -1;
+    s->read_msgfds = 0;
+    s->read_msgfds_num = 0;
     s->write_msgfds = 0;
     s->write_msgfds_num = 0;
 
@@ -2809,7 +2852,7 @@ static CharDriverState *qemu_chr_open_socket_fd(int fd, bool do_nodelay,
     chr->chr_write = tcp_chr_write;
     chr->chr_sync_read = tcp_chr_sync_read;
     chr->chr_close = tcp_chr_close;
-    chr->get_msgfd = tcp_get_msgfd;
+    chr->get_msgfds = tcp_get_msgfds;
     chr->set_msgfds = tcp_set_msgfds;
     chr->chr_add_client = tcp_chr_add_client;
     chr->chr_add_watch = tcp_chr_add_watch;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 06/20] Add G_IO_HUP handler for socket chardev
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (4 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 05/20] Add chardev API qemu_chr_fe_get_msgfds Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 07/20] vhost_net should call the poll callback only when it is set Antonios Motakis
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, n.nikolaev, Anthony Liguori, lukego, Antonios Motakis, tech

This is used to detect that the remote end has disconnected. Just call
tcp_char_disconnect on receiving this event.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 qemu-char.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/qemu-char.c b/qemu-char.c
index f9bd047..68a37d2 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2673,6 +2673,21 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd)
 }
 #endif
 
+static gboolean tcp_chr_chan_close(GIOChannel *channel, GIOCondition cond,
+                                   void *opaque)
+{
+    CharDriverState *chr = opaque;
+
+    if (cond != G_IO_HUP) {
+        return FALSE;
+    }
+
+    /* connection closed */
+    tcp_chr_disconnect(chr);
+
+    return TRUE;
+}
+
 static void tcp_chr_connect(void *opaque)
 {
     CharDriverState *chr = opaque;
@@ -2682,6 +2697,7 @@ static void tcp_chr_connect(void *opaque)
     if (s->chan) {
         chr->fd_in_tag = io_add_watch_poll(s->chan, tcp_chr_read_poll,
                                            tcp_chr_read, chr);
+        g_io_add_watch(s->chan, G_IO_HUP, tcp_chr_chan_close, chr);
     }
     qemu_chr_be_generic_open(chr);
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 07/20] vhost_net should call the poll callback only when it is set
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (5 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 06/20] Add G_IO_HUP handler for socket chardev Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 08/20] Refactor virtio-net to use generic get_vhost_net Antonios Motakis
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel; +Cc: lukego, Antonios Motakis, tech, n.nikolaev, mst

The poll callback needs to be called when bringing up or down
the vhost_net instance. As it is not mandatory for an NetClient
to implement it, invoke it only when it is set.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/net/vhost_net.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index a1de2f4..2fa872b 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -166,7 +166,10 @@ static int vhost_net_start_one(struct vhost_net *net,
         goto fail_start;
     }
 
-    net->nc->info->poll(net->nc, false);
+    if (net->nc->info->poll) {
+        net->nc->info->poll(net->nc, false);
+    }
+
     qemu_set_fd_handler(net->backend, NULL, NULL, NULL);
     file.fd = net->backend;
     for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
@@ -183,7 +186,9 @@ fail:
         int r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
         assert(r >= 0);
     }
-    net->nc->info->poll(net->nc, true);
+    if (net->nc->info->poll) {
+        net->nc->info->poll(net->nc, true);
+    }
     vhost_dev_stop(&net->dev, dev);
 fail_start:
     vhost_dev_disable_notifiers(&net->dev, dev);
@@ -204,7 +209,9 @@ static void vhost_net_stop_one(struct vhost_net *net,
         int r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
         assert(r >= 0);
     }
-    net->nc->info->poll(net->nc, true);
+    if (net->nc->info->poll) {
+        net->nc->info->poll(net->nc, true);
+    }
     vhost_dev_stop(&net->dev, dev);
     vhost_dev_disable_notifiers(&net->dev, dev);
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 08/20] Refactor virtio-net to use generic get_vhost_net
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (6 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 07/20] vhost_net should call the poll callback only when it is set Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 09/20] Add new virtio API virtio_queue_get_avail_idx Antonios Motakis
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, n.nikolaev, Anthony Liguori, Paolo Bonzini, lukego,
	Antonios Motakis, tech

This decouples virtio-net from the TAP netdev backend and allows support
for other backends to be implemented.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/net/vhost_net.c      | 30 +++++++++++++++++++++++++++---
 hw/net/virtio-net.c     | 29 ++++++++---------------------
 include/net/vhost_net.h |  1 +
 3 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 2fa872b..2944ff1 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -231,7 +231,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
     }
 
     for (i = 0; i < total_queues; i++) {
-        r = vhost_net_start_one(tap_get_vhost_net(ncs[i].peer), dev, i * 2);
+        r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
 
         if (r < 0) {
             goto err;
@@ -248,7 +248,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 
 err:
     while (--i >= 0) {
-        vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
+        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
     return r;
 }
@@ -269,7 +269,7 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState *ncs,
     assert(r >= 0);
 
     for (i = 0; i < total_queues; i++) {
-        vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
+        vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
     }
 }
 
@@ -289,6 +289,25 @@ void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
 {
     vhost_virtqueue_mask(&net->dev, dev, idx, mask);
 }
+
+VHostNetState *get_vhost_net(NetClientState *nc)
+{
+    VHostNetState *vhost_net = 0;
+
+    if (!nc) {
+        return 0;
+    }
+
+    switch (nc->info->type) {
+    case NET_CLIENT_OPTIONS_KIND_TAP:
+        vhost_net = tap_get_vhost_net(nc);
+        break;
+    default:
+        break;
+    }
+
+    return vhost_net;
+}
 #else
 struct vhost_net *vhost_net_init(NetClientState *backend, int devfd,
                                  bool force)
@@ -335,4 +354,9 @@ void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
                               int idx, bool mask)
 {
 }
+
+VHostNetState *get_vhost_net(NetClientState *nc)
+{
+    return 0;
+}
 #endif
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3c0342e..addee58 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -105,14 +105,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     NetClientState *nc = qemu_get_queue(n->nic);
     int queues = n->multiqueue ? n->max_queues : 1;
 
-    if (!nc->peer) {
-        return;
-    }
-    if (nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
-        return;
-    }
-
-    if (!tap_get_vhost_net(nc->peer)) {
+    if (!get_vhost_net(nc->peer)) {
         return;
     }
 
@@ -122,7 +115,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
     }
     if (!n->vhost_started) {
         int r;
-        if (!vhost_net_query(tap_get_vhost_net(nc->peer), vdev)) {
+        if (!vhost_net_query(get_vhost_net(nc->peer), vdev)) {
             return;
         }
         n->vhost_started = 1;
@@ -433,13 +426,10 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
         features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
     }
 
-    if (!nc->peer || nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
-        return features;
-    }
-    if (!tap_get_vhost_net(nc->peer)) {
+    if (!get_vhost_net(nc->peer)) {
         return features;
     }
-    return vhost_net_get_features(tap_get_vhost_net(nc->peer), features);
+    return vhost_net_get_features(get_vhost_net(nc->peer), features);
 }
 
 static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
@@ -503,13 +493,10 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint32_t features)
     for (i = 0;  i < n->max_queues; i++) {
         NetClientState *nc = qemu_get_subqueue(n->nic, i);
 
-        if (!nc->peer || nc->peer->info->type != NET_CLIENT_OPTIONS_KIND_TAP) {
-            continue;
-        }
-        if (!tap_get_vhost_net(nc->peer)) {
+        if (!get_vhost_net(nc->peer)) {
             continue;
         }
-        vhost_net_ack_features(tap_get_vhost_net(nc->peer), features);
+        vhost_net_ack_features(get_vhost_net(nc->peer), features);
     }
 }
 
@@ -1439,7 +1426,7 @@ static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
     VirtIONet *n = VIRTIO_NET(vdev);
     NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
     assert(n->vhost_started);
-    return vhost_net_virtqueue_pending(tap_get_vhost_net(nc->peer), idx);
+    return vhost_net_virtqueue_pending(get_vhost_net(nc->peer), idx);
 }
 
 static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
@@ -1448,7 +1435,7 @@ static void virtio_net_guest_notifier_mask(VirtIODevice *vdev, int idx,
     VirtIONet *n = VIRTIO_NET(vdev);
     NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(idx));
     assert(n->vhost_started);
-    vhost_net_virtqueue_mask(tap_get_vhost_net(nc->peer),
+    vhost_net_virtqueue_mask(get_vhost_net(nc->peer),
                              vdev, idx, mask);
 }
 
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 2d936bb..e2bd61c 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -20,4 +20,5 @@ void vhost_net_ack_features(VHostNetState *net, unsigned features);
 bool vhost_net_virtqueue_pending(VHostNetState *net, int n);
 void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
                               int idx, bool mask);
+VHostNetState *get_vhost_net(NetClientState *nc);
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 09/20] Add new virtio API virtio_queue_get_avail_idx
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (7 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 08/20] Refactor virtio-net to use generic get_vhost_net Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop Antonios Motakis
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Peter Maydell, mst, n.nikolaev, Anthony Liguori, Paolo Bonzini,
	lukego, Antonios Motakis, tech, Andreas Färber,
	KONRAD Frederic

This function allows to get the current available ring index.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/virtio/virtio.c         | 5 +++++
 include/hw/virtio/virtio.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index aeabf3a..3c46e86 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1058,6 +1058,11 @@ hwaddr virtio_queue_get_ring_size(VirtIODevice *vdev, int n)
 	    virtio_queue_get_used_size(vdev, n);
 }
 
+uint16_t virtio_queue_get_avail_idx(VirtIODevice *vdev, int n)
+{
+    return vring_avail_idx(&vdev->vq[n]);
+}
+
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n)
 {
     return vdev->vq[n].last_avail_idx;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 3e54e90..3bb0db2 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -239,6 +239,7 @@ hwaddr virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_used_size(VirtIODevice *vdev, int n);
 hwaddr virtio_queue_get_ring_size(VirtIODevice *vdev, int n);
+uint16_t virtio_queue_get_avail_idx(VirtIODevice *vdev, int n);
 uint16_t virtio_queue_get_last_avail_idx(VirtIODevice *vdev, int n);
 void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
 void virtio_queue_invalidate_signalled_used(VirtIODevice *vdev, int n);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (8 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 09/20] Add new virtio API virtio_queue_get_avail_idx Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:45   ` Michael S. Tsirkin
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 11/20] vhost_net_init will use VhostNetOptions to get all its arguments Antonios Motakis
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel; +Cc: lukego, Antonios Motakis, tech, n.nikolaev, mst

On stopping the vhost, a call to VHOST_GET_VRING_BASE is issued. The
received value is stored as last_avail_idx, so the virtqueue can continue
operating if the connection is resumed. Handle the failure of this call
and use the current avail_idx. Some packets from the avail ring may be
omitted but still we keep a sane value and can continue on reconnect.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/virtio/vhost.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 9e336ad..322e2c0 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -758,12 +758,13 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
     assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
     r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
     if (r < 0) {
+        state.num = virtio_queue_get_avail_idx(vdev, idx);
         fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r);
         fflush(stderr);
     }
     virtio_queue_set_last_avail_idx(vdev, idx, state.num);
     virtio_queue_invalidate_signalled_used(vdev, idx);
-    assert (r >= 0);
+
     cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
                               0, virtio_queue_get_ring_size(vdev, idx));
     cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 11/20] vhost_net_init will use VhostNetOptions to get all its arguments
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (9 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 12/20] Add vhost_ops to vhost_dev struct and replace all relevant ioctls Antonios Motakis
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Stefan Hajnoczi, mst, n.nikolaev, Nicholas Bellinger,
	Anthony Liguori, Paolo Bonzini, lukego, Antonios Motakis, tech

vhost_dev_init will replace devfd and devpath with a single opaque argument.
This is initialised with a file descriptor. When TAP is used (through
vhost_net), open /dev/vhost-net and pass the fd as an opaque parameter in
VhostNetOptions. The same applies to vhost-scsi - open /dev/vhost-scsi and
pass the fd.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/net/vhost_net.c        | 23 ++++++++++++-----------
 hw/scsi/vhost-scsi.c      | 10 +++++++++-
 hw/virtio/vhost.c         | 12 +++---------
 include/hw/virtio/vhost.h |  2 +-
 include/net/vhost_net.h   |  8 +++++++-
 net/tap.c                 | 17 +++++++++++++----
 6 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 2944ff1..d4fb53f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -92,32 +92,34 @@ static int vhost_net_get_fd(NetClientState *backend)
     }
 }
 
-struct vhost_net *vhost_net_init(NetClientState *backend, int devfd,
-                                 bool force)
+struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
     int r;
     struct vhost_net *net = g_malloc(sizeof *net);
-    if (!backend) {
-        fprintf(stderr, "vhost-net requires backend to be setup\n");
+
+    if (!options->net_backend) {
+        fprintf(stderr, "vhost-net requires net backend to be setup\n");
         goto fail;
     }
-    r = vhost_net_get_fd(backend);
+
+    r = vhost_net_get_fd(options->net_backend);
     if (r < 0) {
         goto fail;
     }
-    net->nc = backend;
-    net->dev.backend_features = qemu_has_vnet_hdr(backend) ? 0 :
+    net->nc = options->net_backend;
+    net->dev.backend_features = qemu_has_vnet_hdr(options->net_backend) ? 0 :
         (1 << VHOST_NET_F_VIRTIO_NET_HDR);
     net->backend = r;
 
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
 
-    r = vhost_dev_init(&net->dev, devfd, "/dev/vhost-net", force);
+    r = vhost_dev_init(&net->dev, options->opaque,
+                       options->force);
     if (r < 0) {
         goto fail;
     }
-    if (!qemu_has_vnet_hdr_len(backend,
+    if (!qemu_has_vnet_hdr_len(options->net_backend,
                                sizeof(struct virtio_net_hdr_mrg_rxbuf))) {
         net->dev.features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
     }
@@ -309,8 +311,7 @@ VHostNetState *get_vhost_net(NetClientState *nc)
     return vhost_net;
 }
 #else
-struct vhost_net *vhost_net_init(NetClientState *backend, int devfd,
-                                 bool force)
+struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
     error_report("vhost-net support is not compiled in");
     return NULL;
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3983a5b..9b03fb6 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -215,6 +215,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp)
             error_setg(errp, "vhost-scsi: unable to parse vhostfd");
             return;
         }
+    } else {
+        vhostfd = open("/dev/vhost-scsi", O_RDWR);
+        if (vhostfd < 0) {
+            error_setg(errp, "vhost-scsi: open vhost char device failed: %s",
+                       strerror(errno));
+            return;
+        }
     }
 
     virtio_scsi_common_realize(dev, &err);
@@ -227,7 +234,8 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp)
     s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
     s->dev.vq_index = 0;
 
-    ret = vhost_dev_init(&s->dev, vhostfd, "/dev/vhost-scsi", true);
+    ret = vhost_dev_init(&s->dev, (void *)(uintptr_t)vhostfd,
+                         true);
     if (ret < 0) {
         error_setg(errp, "vhost-scsi: vhost initialization failed: %s",
                    strerror(-ret));
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 322e2c0..adef689 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -815,19 +815,13 @@ static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
     event_notifier_cleanup(&vq->masked_notifier);
 }
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
+int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
                    bool force)
 {
     uint64_t features;
     int i, r;
-    if (devfd >= 0) {
-        hdev->control = devfd;
-    } else {
-        hdev->control = open(devpath, O_RDWR);
-        if (hdev->control < 0) {
-            return -errno;
-        }
-    }
+    hdev->control = (uintptr_t) opaque;;
+
     r = ioctl(hdev->control, VHOST_SET_OWNER, NULL);
     if (r < 0) {
         goto fail;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index de24746..eb25ffa 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -50,7 +50,7 @@ struct vhost_dev {
     hwaddr mem_changed_end_addr;
 };
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
+int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
                    bool force);
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 bool vhost_dev_query(struct vhost_dev *hdev, VirtIODevice *vdev);
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index e2bd61c..2067ee2 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -6,7 +6,13 @@
 struct vhost_net;
 typedef struct vhost_net VHostNetState;
 
-VHostNetState *vhost_net_init(NetClientState *backend, int devfd, bool force);
+typedef struct VhostNetOptions {
+    NetClientState *net_backend;
+    void *opaque;
+    bool force;
+} VhostNetOptions;
+
+struct vhost_net *vhost_net_init(VhostNetOptions *options);
 
 bool vhost_net_query(VHostNetState *net, VirtIODevice *dev);
 int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, int total_queues);
diff --git a/net/tap.c b/net/tap.c
index 2d5099b..fb50106 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -597,6 +597,7 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                             int vnet_hdr, int fd)
 {
     TAPState *s;
+    int vhostfd;
 
     s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
     if (!s) {
@@ -627,7 +628,10 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
 
     if (tap->has_vhost ? tap->vhost :
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
-        int vhostfd;
+        VhostNetOptions options;
+
+        options.net_backend = &s->nc;
+        options.force = tap->has_vhostforce && tap->vhostforce;
 
         if (tap->has_vhostfd || tap->has_vhostfds) {
             vhostfd = monitor_handle_fd_param(cur_mon, vhostfdname);
@@ -635,11 +639,16 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                 return -1;
             }
         } else {
-            vhostfd = -1;
+            vhostfd = open("/dev/vhost-net", O_RDWR);
+            if (vhostfd < 0) {
+                error_report("tap: open vhost char device failed: %s",
+                           strerror(errno));
+                return -1;
+            }
         }
+        options.opaque = (void *)(uintptr_t)vhostfd;
 
-        s->vhost_net = vhost_net_init(&s->nc, vhostfd,
-                                      tap->has_vhostforce && tap->vhostforce);
+        s->vhost_net = vhost_net_init(&options);
         if (!s->vhost_net) {
             error_report("vhost-net requested but could not be initialized");
             return -1;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 12/20] Add vhost_ops to vhost_dev struct and replace all relevant ioctls
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (10 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 11/20] vhost_net_init will use VhostNetOptions to get all its arguments Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev Antonios Motakis
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, n.nikolaev, Nicholas Bellinger, Paolo Bonzini, lukego,
	Antonios Motakis, tech

Decouple vhost from the Linux kernel by introducing vhost_ops. The
intention is to provide different backends - a 'kernel' backend based on
the ioctl interface, and an 'user' backend based on a UNIX domain socket
and shared memory interface.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/net/vhost_net.c                | 10 ++++++----
 hw/scsi/vhost-scsi.c              | 10 +++++++---
 hw/virtio/vhost.c                 | 41 ++++++++++++++++++++-------------------
 include/hw/virtio/vhost-backend.h | 27 ++++++++++++++++++++++++++
 include/hw/virtio/vhost.h         |  2 ++
 5 files changed, 63 insertions(+), 27 deletions(-)
 create mode 100644 include/hw/virtio/vhost-backend.h

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index d4fb53f..0fb4fa5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -27,7 +27,6 @@
 #include <sys/socket.h>
 #include <linux/kvm.h>
 #include <fcntl.h>
-#include <sys/ioctl.h>
 #include <linux/virtio_ring.h>
 #include <netpacket/packet.h>
 #include <net/ethernet.h>
@@ -175,7 +174,8 @@ static int vhost_net_start_one(struct vhost_net *net,
     qemu_set_fd_handler(net->backend, NULL, NULL, NULL);
     file.fd = net->backend;
     for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
-        r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
+        const VhostOps *vhost_ops = net->dev.vhost_ops;
+        r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND, &file);
         if (r < 0) {
             r = -errno;
             goto fail;
@@ -185,7 +185,8 @@ static int vhost_net_start_one(struct vhost_net *net,
 fail:
     file.fd = -1;
     while (file.index-- > 0) {
-        int r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
+        const VhostOps *vhost_ops = net->dev.vhost_ops;
+        int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND, &file);
         assert(r >= 0);
     }
     if (net->nc->info->poll) {
@@ -208,7 +209,8 @@ static void vhost_net_stop_one(struct vhost_net *net,
     }
 
     for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
-        int r = ioctl(net->dev.control, VHOST_NET_SET_BACKEND, &file);
+        const VhostOps *vhost_ops = net->dev.vhost_ops;
+        int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND, &file);
         assert(r >= 0);
     }
     if (net->nc->info->poll) {
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 9b03fb6..48a9ced 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -27,12 +27,13 @@
 static int vhost_scsi_set_endpoint(VHostSCSI *s)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
+    const VhostOps *vhost_ops = s->dev.vhost_ops;
     struct vhost_scsi_target backend;
     int ret;
 
     memset(&backend, 0, sizeof(backend));
     pstrcpy(backend.vhost_wwpn, sizeof(backend.vhost_wwpn), vs->conf.wwpn);
-    ret = ioctl(s->dev.control, VHOST_SCSI_SET_ENDPOINT, &backend);
+    ret = vhost_ops->vhost_call(&s->dev, VHOST_SCSI_SET_ENDPOINT, &backend);
     if (ret < 0) {
         return -errno;
     }
@@ -43,10 +44,11 @@ static void vhost_scsi_clear_endpoint(VHostSCSI *s)
 {
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
     struct vhost_scsi_target backend;
+    const VhostOps *vhost_ops = s->dev.vhost_ops;
 
     memset(&backend, 0, sizeof(backend));
     pstrcpy(backend.vhost_wwpn, sizeof(backend.vhost_wwpn), vs->conf.wwpn);
-    ioctl(s->dev.control, VHOST_SCSI_CLEAR_ENDPOINT, &backend);
+    vhost_ops->vhost_call(&s->dev, VHOST_SCSI_CLEAR_ENDPOINT, &backend);
 }
 
 static int vhost_scsi_start(VHostSCSI *s)
@@ -55,13 +57,15 @@ static int vhost_scsi_start(VHostSCSI *s)
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
     BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
     VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+    const VhostOps *vhost_ops = s->dev.vhost_ops;
 
     if (!k->set_guest_notifiers) {
         error_report("binding does not support guest notifiers");
         return -ENOSYS;
     }
 
-    ret = ioctl(s->dev.control, VHOST_SCSI_GET_ABI_VERSION, &abi_version);
+    ret = vhost_ops->vhost_call(&s->dev,
+                                VHOST_SCSI_GET_ABI_VERSION, &abi_version);
     if (ret < 0) {
         return -errno;
     }
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index adef689..346193f 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -13,7 +13,6 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 
-#include <sys/ioctl.h>
 #include "hw/virtio/vhost.h"
 #include "hw/hw.h"
 #include "qemu/atomic.h"
@@ -291,7 +290,7 @@ static inline void vhost_dev_log_resize(struct vhost_dev* dev, uint64_t size)
 
     log = g_malloc0(size * sizeof *log);
     log_base = (uint64_t)(unsigned long)log;
-    r = ioctl(dev->control, VHOST_SET_LOG_BASE, &log_base);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_LOG_BASE, &log_base);
     assert(r >= 0);
     /* Sync only the range covered by the old log */
     if (dev->log_size) {
@@ -460,7 +459,7 @@ static void vhost_commit(MemoryListener *listener)
     }
 
     if (!dev->log_enabled) {
-        r = ioctl(dev->control, VHOST_SET_MEM_TABLE, dev->mem);
+        r = dev->vhost_ops->vhost_call(dev, VHOST_SET_MEM_TABLE, dev->mem);
         assert(r >= 0);
         dev->memory_changed = false;
         return;
@@ -473,7 +472,7 @@ static void vhost_commit(MemoryListener *listener)
     if (dev->log_size < log_size) {
         vhost_dev_log_resize(dev, log_size + VHOST_LOG_BUFFER);
     }
-    r = ioctl(dev->control, VHOST_SET_MEM_TABLE, dev->mem);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_MEM_TABLE, dev->mem);
     assert(r >= 0);
     /* To log less, can only decrease log size after table update. */
     if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
@@ -541,7 +540,7 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
         .log_guest_addr = vq->used_phys,
         .flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0,
     };
-    int r = ioctl(dev->control, VHOST_SET_VRING_ADDR, &addr);
+    int r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_ADDR, &addr);
     if (r < 0) {
         return -errno;
     }
@@ -555,7 +554,7 @@ static int vhost_dev_set_features(struct vhost_dev *dev, bool enable_log)
     if (enable_log) {
         features |= 0x1 << VHOST_F_LOG_ALL;
     }
-    r = ioctl(dev->control, VHOST_SET_FEATURES, &features);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_FEATURES, &features);
     return r < 0 ? -errno : 0;
 }
 
@@ -670,13 +669,13 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
     assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
 
     vq->num = state.num = virtio_queue_get_num(vdev, idx);
-    r = ioctl(dev->control, VHOST_SET_VRING_NUM, &state);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_NUM, &state);
     if (r) {
         return -errno;
     }
 
     state.num = virtio_queue_get_last_avail_idx(vdev, idx);
-    r = ioctl(dev->control, VHOST_SET_VRING_BASE, &state);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_BASE, &state);
     if (r) {
         return -errno;
     }
@@ -718,7 +717,7 @@ static int vhost_virtqueue_start(struct vhost_dev *dev,
     }
 
     file.fd = event_notifier_get_fd(virtio_queue_get_host_notifier(vvq));
-    r = ioctl(dev->control, VHOST_SET_VRING_KICK, &file);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_KICK, &file);
     if (r) {
         r = -errno;
         goto fail_kick;
@@ -756,7 +755,7 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
     };
     int r;
     assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
-    r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_GET_VRING_BASE, &state);
     if (r < 0) {
         state.num = virtio_queue_get_avail_idx(vdev, idx);
         fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r);
@@ -799,7 +798,7 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
     }
 
     file.fd = event_notifier_get_fd(&vq->masked_notifier);
-    r = ioctl(dev->control, VHOST_SET_VRING_CALL, &file);
+    r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_CALL, &file);
     if (r) {
         r = -errno;
         goto fail_call;
@@ -820,14 +819,17 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 {
     uint64_t features;
     int i, r;
-    hdev->control = (uintptr_t) opaque;;
 
-    r = ioctl(hdev->control, VHOST_SET_OWNER, NULL);
+    if (hdev->vhost_ops->vhost_backend_init(hdev, opaque) < 0) {
+        return -errno;
+    }
+
+    r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_OWNER, NULL);
     if (r < 0) {
         goto fail;
     }
 
-    r = ioctl(hdev->control, VHOST_GET_FEATURES, &features);
+    r = hdev->vhost_ops->vhost_call(hdev, VHOST_GET_FEATURES, &features);
     if (r < 0) {
         goto fail;
     }
@@ -872,7 +874,7 @@ fail_vq:
     }
 fail:
     r = -errno;
-    close(hdev->control);
+    hdev->vhost_ops->vhost_backend_cleanup(hdev);
     return r;
 }
 
@@ -885,7 +887,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
     memory_listener_unregister(&hdev->memory_listener);
     g_free(hdev->mem);
     g_free(hdev->mem_sections);
-    close(hdev->control);
+    hdev->vhost_ops->vhost_backend_cleanup(hdev);
 }
 
 bool vhost_dev_query(struct vhost_dev *hdev, VirtIODevice *vdev)
@@ -987,7 +989,7 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
     } else {
         file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq));
     }
-    r = ioctl(hdev->control, VHOST_SET_VRING_CALL, &file);
+    r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_VRING_CALL, &file);
     assert(r >= 0);
 }
 
@@ -1002,7 +1004,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
     if (r < 0) {
         goto fail_features;
     }
-    r = ioctl(hdev->control, VHOST_SET_MEM_TABLE, hdev->mem);
+    r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_MEM_TABLE, hdev->mem);
     if (r < 0) {
         r = -errno;
         goto fail_mem;
@@ -1021,8 +1023,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
         hdev->log_size = vhost_get_log_size(hdev);
         hdev->log = hdev->log_size ?
             g_malloc0(hdev->log_size * sizeof *hdev->log) : NULL;
-        r = ioctl(hdev->control, VHOST_SET_LOG_BASE,
-                  (uint64_t)(unsigned long)hdev->log);
+        r = hdev->vhost_ops->vhost_call(hdev, VHOST_SET_LOG_BASE, hdev->log);
         if (r < 0) {
             r = -errno;
             goto fail_log;
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
new file mode 100644
index 0000000..14e5878
--- /dev/null
+++ b/include/hw/virtio/vhost-backend.h
@@ -0,0 +1,27 @@
+/*
+ * vhost-backend
+ *
+ * Copyright (c) 2013 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef VHOST_BACKEND_H_
+#define VHOST_BACKEND_H_
+
+struct vhost_dev;
+
+typedef int (*vhost_call)(struct vhost_dev *dev, unsigned long int request,
+             void *arg);
+typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque);
+typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev);
+
+typedef struct VhostOps {
+    vhost_call vhost_call;
+    vhost_backend_init vhost_backend_init;
+    vhost_backend_cleanup vhost_backend_cleanup;
+} VhostOps;
+
+#endif /* VHOST_BACKEND_H_ */
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index eb25ffa..97641b6 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -2,6 +2,7 @@
 #define VHOST_H
 
 #include "hw/hw.h"
+#include "hw/virtio/vhost-backend.h"
 #include "hw/virtio/virtio.h"
 #include "exec/memory.h"
 
@@ -48,6 +49,7 @@ struct vhost_dev {
     bool memory_changed;
     hwaddr mem_changed_start_addr;
     hwaddr mem_changed_end_addr;
+    const VhostOps *vhost_ops;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (11 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 12/20] Add vhost_ops to vhost_dev struct and replace all relevant ioctls Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:38   ` Michael S. Tsirkin
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 14/20] Add vhost-backend and VhostBackendType Antonios Motakis
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: mst, n.nikolaev, Nicholas Bellinger, Paolo Bonzini, lukego,
	Antonios Motakis, tech

This will be used in a following patch to ensure that a vhost-user
client reconnecting to QEMU supports the features that were exposed
by the first client that initiated the virtio-net session.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/net/vhost_net.c        | 10 ++++++++++
 include/hw/virtio/vhost.h |  1 +
 include/net/vhost_net.h   |  2 ++
 3 files changed, 13 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 0fb4fa5..38e1e8a 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -80,6 +80,11 @@ void vhost_net_ack_features(struct vhost_net *net, unsigned features)
     }
 }
 
+unsigned long long vhost_net_features(VHostNetState *net)
+{
+    return net->dev.features;
+}
+
 static int vhost_net_get_fd(NetClientState *backend)
 {
     switch (backend->info->type) {
@@ -112,6 +117,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 
     net->dev.nvqs = 2;
     net->dev.vqs = net->vqs;
+    net->dev.mandatory_features = options->mandatory_features;
 
     r = vhost_dev_init(&net->dev, options->opaque,
                        options->force);
@@ -347,6 +353,10 @@ unsigned vhost_net_get_features(struct vhost_net *net, unsigned features)
 void vhost_net_ack_features(struct vhost_net *net, unsigned features)
 {
 }
+unsigned long long vhost_net_features(struct vhost_net *net)
+{
+    return 0;
+}
 
 bool vhost_net_virtqueue_pending(VHostNetState *net, int idx)
 {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 97641b6..0068d40 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -41,6 +41,7 @@ struct vhost_dev {
     unsigned long long features;
     unsigned long long acked_features;
     unsigned long long backend_features;
+    unsigned long long mandatory_features;
     bool started;
     bool log_enabled;
     vhost_log_chunk_t *log;
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 2067ee2..b39bb45 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -10,6 +10,7 @@ typedef struct VhostNetOptions {
     NetClientState *net_backend;
     void *opaque;
     bool force;
+    unsigned long long mandatory_features;
 } VhostNetOptions;
 
 struct vhost_net *vhost_net_init(VhostNetOptions *options);
@@ -22,6 +23,7 @@ void vhost_net_cleanup(VHostNetState *net);
 
 unsigned vhost_net_get_features(VHostNetState *net, unsigned features);
 void vhost_net_ack_features(VHostNetState *net, unsigned features);
+unsigned long long vhost_net_features(VHostNetState *net);
 
 bool vhost_net_virtqueue_pending(VHostNetState *net, int n);
 void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 14/20] Add vhost-backend and VhostBackendType
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (12 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 15/20] Add vhost-user as a vhost backend Antonios Motakis
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Peter Maydell, Stefan Hajnoczi, mst, n.nikolaev,
	Nicholas Bellinger, Anthony Liguori, Paolo Bonzini, lukego,
	Antonios Motakis, tech, KONRAD Frederic

Use vhost_set_backend_type to initialise a proper vhost_ops structure.
In vhost_net_init and vhost_net_start_one call conditionally TAP related
initialisation depending on the vhost backend type.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/net/vhost_net.c                | 67 +++++++++++++++++++++++----------------
 hw/scsi/vhost-scsi.c              |  2 +-
 hw/virtio/Makefile.objs           |  2 +-
 hw/virtio/vhost-backend.c         | 66 ++++++++++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  6 +++-
 include/hw/virtio/vhost-backend.h | 11 +++++++
 include/hw/virtio/vhost.h         |  4 +--
 include/net/vhost_net.h           |  2 ++
 net/tap.c                         |  1 +
 9 files changed, 129 insertions(+), 32 deletions(-)
 create mode 100644 hw/virtio/vhost-backend.c

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 38e1e8a..48cfda7 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -98,7 +98,7 @@ static int vhost_net_get_fd(NetClientState *backend)
 
 struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
-    int r;
+    int r = -1;
     struct vhost_net *net = g_malloc(sizeof *net);
 
     if (!options->net_backend) {
@@ -106,9 +106,11 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
         goto fail;
     }
 
-    r = vhost_net_get_fd(options->net_backend);
-    if (r < 0) {
-        goto fail;
+    if (options->backend_type == VHOST_BACKEND_TYPE_KERNEL) {
+        r = vhost_net_get_fd(options->net_backend);
+        if (r < 0) {
+            goto fail;
+        }
     }
     net->nc = options->net_backend;
     net->dev.backend_features = qemu_has_vnet_hdr(options->net_backend) ? 0 :
@@ -120,7 +122,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
     net->dev.mandatory_features = options->mandatory_features;
 
     r = vhost_dev_init(&net->dev, options->opaque,
-                       options->force);
+                       options->backend_type, options->force);
     if (r < 0) {
         goto fail;
     }
@@ -128,13 +130,15 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
                                sizeof(struct virtio_net_hdr_mrg_rxbuf))) {
         net->dev.features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
     }
-    if (~net->dev.features & net->dev.backend_features) {
-        fprintf(stderr, "vhost lacks feature mask %" PRIu64 " for backend\n",
-                (uint64_t)(~net->dev.features & net->dev.backend_features));
-        vhost_dev_cleanup(&net->dev);
-        goto fail;
+    if (options->backend_type == VHOST_BACKEND_TYPE_KERNEL) {
+        if (~net->dev.features & net->dev.backend_features) {
+            fprintf(stderr, "vhost lacks feature mask %" PRIu64
+                   " for backend\n",
+                   (uint64_t)(~net->dev.features & net->dev.backend_features));
+            vhost_dev_cleanup(&net->dev);
+            goto fail;
+        }
     }
-
     /* Set sane init value. Override when guest acks. */
     vhost_net_ack_features(net, 0);
     return net;
@@ -177,23 +181,29 @@ static int vhost_net_start_one(struct vhost_net *net,
         net->nc->info->poll(net->nc, false);
     }
 
-    qemu_set_fd_handler(net->backend, NULL, NULL, NULL);
-    file.fd = net->backend;
-    for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
-        const VhostOps *vhost_ops = net->dev.vhost_ops;
-        r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND, &file);
-        if (r < 0) {
-            r = -errno;
-            goto fail;
+    if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP) {
+        qemu_set_fd_handler(net->backend, NULL, NULL, NULL);
+        file.fd = net->backend;
+        for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
+            const VhostOps *vhost_ops = net->dev.vhost_ops;
+            r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
+                                      &file);
+            if (r < 0) {
+                r = -errno;
+                goto fail;
+            }
         }
     }
     return 0;
 fail:
     file.fd = -1;
-    while (file.index-- > 0) {
-        const VhostOps *vhost_ops = net->dev.vhost_ops;
-        int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND, &file);
-        assert(r >= 0);
+    if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP) {
+        while (file.index-- > 0) {
+            const VhostOps *vhost_ops = net->dev.vhost_ops;
+            int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
+                                          &file);
+            assert(r >= 0);
+        }
     }
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
@@ -214,10 +224,13 @@ static void vhost_net_stop_one(struct vhost_net *net,
         return;
     }
 
-    for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
-        const VhostOps *vhost_ops = net->dev.vhost_ops;
-        int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND, &file);
-        assert(r >= 0);
+    if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_TAP) {
+        for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
+            const VhostOps *vhost_ops = net->dev.vhost_ops;
+            int r = vhost_ops->vhost_call(&net->dev, VHOST_NET_SET_BACKEND,
+                                          &file);
+            assert(r >= 0);
+        }
     }
     if (net->nc->info->poll) {
         net->nc->info->poll(net->nc, true);
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 48a9ced..c099fb6 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -239,7 +239,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp)
     s->dev.vq_index = 0;
 
     ret = vhost_dev_init(&s->dev, (void *)(uintptr_t)vhostfd,
-                         true);
+                         VHOST_BACKEND_TYPE_KERNEL, true);
     if (ret < 0) {
         error_setg(errp, "vhost-scsi: vhost initialization failed: %s",
                    strerror(-ret));
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 1ba53d9..51e5bdb 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -5,4 +5,4 @@ common-obj-y += virtio-mmio.o
 common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/
 
 obj-y += virtio.o virtio-balloon.o 
-obj-$(CONFIG_LINUX) += vhost.o
+obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
new file mode 100644
index 0000000..509e103
--- /dev/null
+++ b/hw/virtio/vhost-backend.c
@@ -0,0 +1,66 @@
+/*
+ * vhost-backend
+ *
+ * Copyright (c) 2013 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-backend.h"
+#include "qemu/error-report.h"
+
+#include <sys/ioctl.h>
+
+static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int request,
+                             void *arg)
+{
+    int fd = (uintptr_t) dev->opaque;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_KERNEL);
+
+    return ioctl(fd, request, arg);
+}
+
+static int vhost_kernel_init(struct vhost_dev *dev, void *opaque)
+{
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_KERNEL);
+
+    dev->opaque = opaque;
+
+    return 0;
+}
+
+static int vhost_kernel_cleanup(struct vhost_dev *dev)
+{
+    int fd = (uintptr_t) dev->opaque;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_KERNEL);
+
+    return close(fd);
+}
+
+static const VhostOps kernel_ops = {
+        .backend_type = VHOST_BACKEND_TYPE_KERNEL,
+        .vhost_call = vhost_kernel_call,
+        .vhost_backend_init = vhost_kernel_init,
+        .vhost_backend_cleanup = vhost_kernel_cleanup
+};
+
+int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
+{
+    int r = 0;
+
+    switch (backend_type) {
+    case VHOST_BACKEND_TYPE_KERNEL:
+        dev->vhost_ops = &kernel_ops;
+        break;
+    default:
+        error_report("Unknown vhost backend type\n");
+        r = -1;
+    }
+
+    return r;
+}
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 346193f..6dacee2 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -815,11 +815,15 @@ static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
 }
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-                   bool force)
+                   VhostBackendType backend_type, bool force)
 {
     uint64_t features;
     int i, r;
 
+    if (vhost_set_backend_type(hdev, backend_type) < 0) {
+        return -1;
+    }
+
     if (hdev->vhost_ops->vhost_backend_init(hdev, opaque) < 0) {
         return -errno;
     }
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index 14e5878..d31768a 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -11,6 +11,13 @@
 #ifndef VHOST_BACKEND_H_
 #define VHOST_BACKEND_H_
 
+typedef enum VhostBackendType {
+    VHOST_BACKEND_TYPE_NONE = 0,
+    VHOST_BACKEND_TYPE_KERNEL = 1,
+    VHOST_BACKEND_TYPE_USER = 2,
+    VHOST_BACKEND_TYPE_MAX = 3,
+} VhostBackendType;
+
 struct vhost_dev;
 
 typedef int (*vhost_call)(struct vhost_dev *dev, unsigned long int request,
@@ -19,9 +26,13 @@ typedef int (*vhost_backend_init)(struct vhost_dev *dev, void *opaque);
 typedef int (*vhost_backend_cleanup)(struct vhost_dev *dev);
 
 typedef struct VhostOps {
+    VhostBackendType backend_type;
     vhost_call vhost_call;
     vhost_backend_init vhost_backend_init;
     vhost_backend_cleanup vhost_backend_cleanup;
 } VhostOps;
 
+int vhost_set_backend_type(struct vhost_dev *dev,
+                           VhostBackendType backend_type);
+
 #endif /* VHOST_BACKEND_H_ */
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 0068d40..e5718da 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -30,7 +30,6 @@ typedef unsigned long vhost_log_chunk_t;
 struct vhost_memory;
 struct vhost_dev {
     MemoryListener memory_listener;
-    int control;
     struct vhost_memory *mem;
     int n_mem_sections;
     MemoryRegionSection *mem_sections;
@@ -51,10 +50,11 @@ struct vhost_dev {
     hwaddr mem_changed_start_addr;
     hwaddr mem_changed_end_addr;
     const VhostOps *vhost_ops;
+    void *opaque;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-                   bool force);
+                   VhostBackendType backend_type, bool force);
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 bool vhost_dev_query(struct vhost_dev *hdev, VirtIODevice *vdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index b39bb45..5d99b37 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -2,11 +2,13 @@
 #define VHOST_NET_H
 
 #include "net/net.h"
+#include "hw/virtio/vhost-backend.h"
 
 struct vhost_net;
 typedef struct vhost_net VHostNetState;
 
 typedef struct VhostNetOptions {
+    VhostBackendType backend_type;
     NetClientState *net_backend;
     void *opaque;
     bool force;
diff --git a/net/tap.c b/net/tap.c
index fb50106..4311f88 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -630,6 +630,7 @@ static int net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
         VhostNetOptions options;
 
+        options.backend_type = VHOST_BACKEND_TYPE_KERNEL;
         options.net_backend = &s->nc;
         options.force = tap->has_vhostforce && tap->vhostforce;
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 15/20] Add vhost-user as a vhost backend.
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (13 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 14/20] Add vhost-backend and VhostBackendType Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 16/20] Add new vhost-user netdev backend Antonios Motakis
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Peter Maydell, mst, n.nikolaev, Paolo Bonzini, lukego,
	Antonios Motakis, tech, KONRAD Frederic

The initialization takes a chardev backed by a unix domain socket.
It should implement qemu_fe_set_msgfds in order to be able to pass
file descriptors to the remote process.

Each ioctl request of vhost-kernel has a vhost-user message equivalent,
which is sent over the control socket.

The general approach is to copy the data from the supplied argument
pointer to a designated field in the message. If a file descriptor is
to be passed it will be placed in the fds array for inclusion in
the sendmsg control header.

VHOST_SET_MEM_TABLE ignores the supplied vhost_memory structure and scans
the global ram_list for ram blocks with a valid fd field set. This would
be set when the -mem-path option with shared=on property is used.

Upon receiving VHOST_USER_GET_FEATURES reply, the new features value is
compared to the mandatory features in the vhost_dev.

Message VHOST_USER_GET_VRING_BASE gets a special handling in case of I/O
failure - return -1 to indicate the upper layer that it failed.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hw/virtio/Makefile.objs   |   2 +-
 hw/virtio/vhost-backend.c |   5 +
 hw/virtio/vhost-user.c    | 356 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 362 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-user.c

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 51e5bdb..ec9e855 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -5,4 +5,4 @@ common-obj-y += virtio-mmio.o
 common-obj-$(CONFIG_VIRTIO_BLK_DATA_PLANE) += dataplane/
 
 obj-y += virtio.o virtio-balloon.o 
-obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o
+obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 509e103..35316c4 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -14,6 +14,8 @@
 
 #include <sys/ioctl.h>
 
+extern const VhostOps user_ops;
+
 static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int request,
                              void *arg)
 {
@@ -57,6 +59,9 @@ int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
     case VHOST_BACKEND_TYPE_KERNEL:
         dev->vhost_ops = &kernel_ops;
         break;
+    case VHOST_BACKEND_TYPE_USER:
+        dev->vhost_ops = &user_ops;
+        break;
     default:
         error_report("Unknown vhost backend type\n");
         r = -1;
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
new file mode 100644
index 0000000..47ffd13
--- /dev/null
+++ b/hw/virtio/vhost-user.c
@@ -0,0 +1,356 @@
+/*
+ * vhost-user
+ *
+ * Copyright (c) 2013 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-backend.h"
+#include "sysemu/char.h"
+#include "sysemu/kvm.h"
+#include "qemu/error-report.h"
+#include "qemu/sockets.h"
+
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <linux/vhost.h>
+
+#define VHOST_MEMORY_MAX_NREGIONS    8
+
+typedef enum VhostUserRequest {
+    VHOST_USER_NONE = 0,
+    VHOST_USER_GET_FEATURES = 1,
+    VHOST_USER_SET_FEATURES = 2,
+    VHOST_USER_SET_OWNER = 3,
+    VHOST_USER_RESET_OWNER = 4,
+    VHOST_USER_SET_MEM_TABLE = 5,
+    VHOST_USER_SET_LOG_BASE = 6,
+    VHOST_USER_SET_LOG_FD = 7,
+    VHOST_USER_SET_VRING_NUM = 8,
+    VHOST_USER_SET_VRING_ADDR = 9,
+    VHOST_USER_SET_VRING_BASE = 10,
+    VHOST_USER_GET_VRING_BASE = 11,
+    VHOST_USER_SET_VRING_KICK = 12,
+    VHOST_USER_SET_VRING_CALL = 13,
+    VHOST_USER_SET_VRING_ERR = 14,
+    VHOST_USER_MAX
+} VhostUserRequest;
+
+typedef struct VhostUserMemoryRegion {
+    uint64_t guest_phys_addr;
+    uint64_t memory_size;
+    uint64_t userspace_addr;
+} VhostUserMemoryRegion;
+
+typedef struct VhostUserMemory {
+    uint32_t nregions;
+    uint32_t padding;
+    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+} VhostUserMemory;
+
+typedef struct VhostUserMsg {
+    VhostUserRequest request;
+
+#define VHOST_USER_VERSION_MASK     (0x3)
+#define VHOST_USER_REPLY_MASK       (0x1<<2)
+    uint32_t flags;
+    uint32_t size; /* the following payload size */
+    union {
+#define VHOST_USER_VRING_IDX_MASK   (0xff)
+#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
+        uint64_t u64;
+        struct vhost_vring_state state;
+        struct vhost_vring_addr addr;
+        VhostUserMemory memory;
+    };
+} QEMU_PACKED VhostUserMsg;
+
+static VhostUserMsg m __attribute__ ((unused));
+#define VHOST_USER_HDR_SIZE (sizeof(m.request) \
+                            + sizeof(m.flags) \
+                            + sizeof(m.size))
+
+#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE)
+
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION    (0x1)
+
+static bool ioeventfd_enabled(void)
+{
+    return kvm_enabled() && kvm_eventfds_enabled();
+}
+
+static unsigned long int ioctl_to_vhost_user_request[VHOST_USER_MAX] = {
+    -1,                     /* VHOST_USER_NONE */
+    VHOST_GET_FEATURES,     /* VHOST_USER_GET_FEATURES */
+    VHOST_SET_FEATURES,     /* VHOST_USER_SET_FEATURES */
+    VHOST_SET_OWNER,        /* VHOST_USER_SET_OWNER */
+    VHOST_RESET_OWNER,      /* VHOST_USER_RESET_OWNER */
+    VHOST_SET_MEM_TABLE,    /* VHOST_USER_SET_MEM_TABLE */
+    VHOST_SET_LOG_BASE,     /* VHOST_USER_SET_LOG_BASE */
+    VHOST_SET_LOG_FD,       /* VHOST_USER_SET_LOG_FD */
+    VHOST_SET_VRING_NUM,    /* VHOST_USER_SET_VRING_NUM */
+    VHOST_SET_VRING_ADDR,   /* VHOST_USER_SET_VRING_ADDR */
+    VHOST_SET_VRING_BASE,   /* VHOST_USER_SET_VRING_BASE */
+    VHOST_GET_VRING_BASE,   /* VHOST_USER_GET_VRING_BASE */
+    VHOST_SET_VRING_KICK,   /* VHOST_USER_SET_VRING_KICK */
+    VHOST_SET_VRING_CALL,   /* VHOST_USER_SET_VRING_CALL */
+    VHOST_SET_VRING_ERR     /* VHOST_USER_SET_VRING_ERR */
+};
+
+static VhostUserRequest vhost_user_request_translate(unsigned long int request)
+{
+    VhostUserRequest idx;
+
+    for (idx = 0; idx < VHOST_USER_MAX; idx++) {
+        if (ioctl_to_vhost_user_request[idx] == request) {
+            break;
+        }
+    }
+
+    return (idx == VHOST_USER_MAX) ? VHOST_USER_NONE : idx;
+}
+
+static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
+{
+    CharDriverState *chr = dev->opaque;
+    uint8_t *p = (uint8_t *) msg;
+    int r, size = VHOST_USER_HDR_SIZE;
+
+    r = qemu_chr_fe_read_all(chr, p, size);
+    if (r != size) {
+        error_report("Failed to read msg header. Read %d instead of %d.\n", r,
+                size);
+        goto fail;
+    }
+
+    /* validate received flags */
+    if (msg->flags != (VHOST_USER_REPLY_MASK | VHOST_USER_VERSION)) {
+        error_report("Failed to read msg header."
+                " Flags 0x%x instead of 0x%x.\n", msg->flags,
+                VHOST_USER_REPLY_MASK | VHOST_USER_VERSION);
+        goto fail;
+    }
+
+    /* validate message size is sane */
+    if (msg->size > VHOST_USER_PAYLOAD_SIZE) {
+        error_report("Failed to read msg header."
+                " Size %d exceeds the maximum %zu.\n", msg->size,
+                VHOST_USER_PAYLOAD_SIZE);
+        goto fail;
+    }
+
+    if (msg->size) {
+        p += VHOST_USER_HDR_SIZE;
+        size = msg->size;
+        r = qemu_chr_fe_read_all(chr, p, size);
+        if (r != size) {
+            error_report("Failed to read msg payload."
+                         " Read %d instead of %d.\n", r, msg->size);
+            goto fail;
+        }
+    }
+
+    return 0;
+
+fail:
+    return -1;
+}
+
+static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
+                            int *fds, int fd_num)
+{
+    CharDriverState *chr = dev->opaque;
+    int size = VHOST_USER_HDR_SIZE + msg->size;
+
+    if (fd_num) {
+        qemu_chr_fe_set_msgfds(chr, fds, fd_num);
+    }
+
+    return qemu_chr_fe_write_all(chr, (const uint8_t *) msg, size) == size ?
+            0 : -1;
+}
+
+static int vhost_user_call(struct vhost_dev *dev, unsigned long int request,
+        void *arg)
+{
+    VhostUserMsg msg;
+    VhostUserRequest msg_request;
+    RAMBlock *block = 0;
+    struct vhost_vring_file *file = 0;
+    int need_reply = 0;
+    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    size_t fd_num = 0;
+
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
+
+    msg_request = vhost_user_request_translate(request);
+    msg.request = msg_request;
+    msg.flags = VHOST_USER_VERSION;
+    msg.size = 0;
+
+    switch (request) {
+    case VHOST_GET_FEATURES:
+        need_reply = 1;
+        break;
+
+    case VHOST_SET_FEATURES:
+    case VHOST_SET_LOG_BASE:
+        msg.u64 = *((__u64 *) arg);
+        msg.size = sizeof(m.u64);
+        break;
+
+    case VHOST_SET_OWNER:
+    case VHOST_RESET_OWNER:
+        break;
+
+    case VHOST_SET_MEM_TABLE:
+        QTAILQ_FOREACH(block, &ram_list.blocks, next)
+        {
+            if (block->fd > 0) {
+                msg.memory.regions[fd_num].userspace_addr =
+                    (uintptr_t) block->host;
+                msg.memory.regions[fd_num].memory_size = block->length;
+                msg.memory.regions[fd_num].guest_phys_addr = block->offset;
+                fds[fd_num++] = block->fd;
+            }
+        }
+
+        msg.memory.nregions = fd_num;
+
+        if (!fd_num) {
+            error_report("Failed initializing vhost-user memory map\n"
+                    "consider using -mem-path option\n");
+            return -1;
+        }
+
+        msg.size = sizeof(m.memory.nregions);
+        msg.size += sizeof(m.memory.padding);
+        msg.size += fd_num * sizeof(VhostUserMemoryRegion);
+
+        break;
+
+    case VHOST_SET_LOG_FD:
+        fds[fd_num++] = *((int *) arg);
+        break;
+
+    case VHOST_SET_VRING_NUM:
+    case VHOST_SET_VRING_BASE:
+        memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
+        msg.size = sizeof(m.state);
+        break;
+
+    case VHOST_GET_VRING_BASE:
+        memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
+        msg.size = sizeof(m.state);
+        need_reply = 1;
+        break;
+
+    case VHOST_SET_VRING_ADDR:
+        memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
+        msg.size = sizeof(m.addr);
+        break;
+
+    case VHOST_SET_VRING_KICK:
+    case VHOST_SET_VRING_CALL:
+    case VHOST_SET_VRING_ERR:
+        file = arg;
+        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
+        msg.size = sizeof(m.u64);
+        if (ioeventfd_enabled() && file->fd > 0) {
+            fds[fd_num++] = file->fd;
+        } else {
+            msg.u64 |= VHOST_USER_VRING_NOFD_MASK;
+        }
+        break;
+    default:
+        error_report("vhost-user trying to send unhandled ioctl\n");
+        return -1;
+        break;
+    }
+
+    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+        goto iofail;
+    }
+
+    if (need_reply) {
+        if (vhost_user_read(dev, &msg) < 0) {
+            goto iofail;
+        }
+
+        if (msg_request != msg.request) {
+            error_report("Received unexpected msg type."
+                    " Expected %d received %d\n", msg_request, msg.request);
+            return -1;
+        }
+
+        switch (msg_request) {
+        case VHOST_USER_GET_FEATURES:
+            if (msg.size != sizeof(m.u64)) {
+                error_report("Received bad msg size.\n");
+                return -1;
+            }
+            /* check if mandatory features are met */
+            if ((dev->mandatory_features & msg.u64)
+                    != dev->mandatory_features) {
+                error_report("Received features do not match mandatory.\n");
+                return -1;
+            }
+            *((__u64 *) arg) = msg.u64;
+            break;
+        case VHOST_USER_GET_VRING_BASE:
+            if (msg.size != sizeof(m.state)) {
+                error_report("Received bad msg size.\n");
+                return -1;
+            }
+            memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
+            break;
+        default:
+            error_report("Received unexpected msg type.\n");
+            return -1;
+            break;
+        }
+    }
+
+    return 0;
+
+iofail:
+
+    if (msg_request == VHOST_USER_GET_VRING_BASE) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int vhost_user_init(struct vhost_dev *dev, void *opaque)
+{
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
+
+    dev->opaque = opaque;
+
+    return 0;
+}
+
+static int vhost_user_cleanup(struct vhost_dev *dev)
+{
+    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
+
+    dev->opaque = 0;
+
+    return 0;
+}
+
+const VhostOps user_ops = {
+        .backend_type = VHOST_BACKEND_TYPE_USER,
+        .vhost_call = vhost_user_call,
+        .vhost_backend_init = vhost_user_init,
+        .vhost_backend_cleanup = vhost_user_cleanup
+        };
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 16/20] Add new vhost-user netdev backend
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (14 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 15/20] Add vhost-user as a vhost backend Antonios Motakis
@ 2014-03-04 18:22 ` Antonios Motakis
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 17/20] Add the vhost-user netdev backend to the command line Antonios Motakis
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:22 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Stefan Hajnoczi, mst, n.nikolaev, Anthony Liguori, lukego,
	Antonios Motakis, tech

Add a new QEMU netdev backend that is intended to invoke vhost_net with the
vhost-user backend. It uses an Unix socket chardev to establish a
communication with the 'slave' (client and server mode supported).

At runtime the netdev will handle OPEN/CLOSE events from the chardev. Upon
disconnection it will set link_down accordingly and notify virtio-net; the
virtio-net interface will go down.

On reconnection a new vhost-net instance will get the saved features
from the previous session from the saved 'mandatory features'. This will
ensure that the newly connected slave is compatible with the first one.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 include/net/vhost-user.h |  17 ++++++
 net/Makefile.objs        |   2 +-
 net/clients.h            |   3 +
 net/vhost-user.c         | 150 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 include/net/vhost-user.h
 create mode 100644 net/vhost-user.c

diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
new file mode 100644
index 0000000..85109f6
--- /dev/null
+++ b/include/net/vhost-user.h
@@ -0,0 +1,17 @@
+/*
+ * vhost-user.h
+ *
+ * Copyright (c) 2013 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef VHOST_USER_H_
+#define VHOST_USER_H_
+
+struct vhost_net;
+struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
+
+#endif /* VHOST_USER_H_ */
diff --git a/net/Makefile.objs b/net/Makefile.objs
index c25fe69..301f6b6 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_POSIX) += tap.o
+common-obj-$(CONFIG_POSIX) += tap.o vhost-user.o
 common-obj-$(CONFIG_LINUX) += tap-linux.o
 common-obj-$(CONFIG_WIN32) += tap-win32.o
 common-obj-$(CONFIG_BSD) += tap-bsd.o
diff --git a/net/clients.h b/net/clients.h
index 7322ff5..7f3d4ae 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -57,4 +57,7 @@ int net_init_netmap(const NetClientOptions *opts, const char *name,
                     NetClientState *peer);
 #endif
 
+int net_init_vhost_user(const NetClientOptions *opts, const char *name,
+                        NetClientState *peer);
+
 #endif /* QEMU_NET_CLIENTS_H */
diff --git a/net/vhost-user.c b/net/vhost-user.c
new file mode 100644
index 0000000..292be41
--- /dev/null
+++ b/net/vhost-user.c
@@ -0,0 +1,150 @@
+/*
+ * vhost-user.c
+ *
+ * Copyright (c) 2013 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "clients.h"
+#include "net/vhost_net.h"
+#include "net/vhost-user.h"
+#include "sysemu/char.h"
+#include "qemu/error-report.h"
+
+typedef struct VhostUserState {
+    NetClientState nc;
+    CharDriverState *chr;
+    bool vhostforce;
+    VHostNetState *vhost_net;
+    unsigned long long features;
+} VhostUserState;
+
+VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
+{
+    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    return s->vhost_net;
+}
+
+static int vhost_user_running(VhostUserState *s)
+{
+    return (s->vhost_net) ? 1 : 0;
+}
+
+static int vhost_user_start(VhostUserState *s)
+{
+    VhostNetOptions options;
+
+    if (vhost_user_running(s)) {
+        return 0;
+    }
+
+    options.backend_type = VHOST_BACKEND_TYPE_USER;
+    options.net_backend = &s->nc;
+    options.opaque = s->chr;
+    options.force = s->vhostforce;
+    if (s->features) {
+        options.mandatory_features = s->features;
+    } else {
+        options.mandatory_features = 0;
+    }
+
+    s->vhost_net = vhost_net_init(&options);
+
+    /* store the negotiated features in case of reconnection */
+    if (vhost_user_running(s) && !s->features) {
+        s->features = vhost_net_features(s->vhost_net);
+    }
+
+    return vhost_user_running(s) ? 0 : -1;
+}
+
+static void vhost_user_stop(VhostUserState *s)
+{
+    if (vhost_user_running(s)) {
+        vhost_net_cleanup(s->vhost_net);
+    }
+
+    s->vhost_net = 0;
+}
+
+static void vhost_user_cleanup(NetClientState *nc)
+{
+    VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+
+    vhost_user_stop(s);
+    qemu_purge_queued_packets(nc);
+}
+
+static NetClientInfo net_vhost_user_info = {
+        .type = 0,
+        .size = sizeof(VhostUserState),
+        .cleanup = vhost_user_cleanup,
+};
+
+static void net_vhost_link_down(VhostUserState *s, bool link_down)
+{
+    s->nc.link_down = link_down;
+
+    if (s->nc.peer) {
+        s->nc.peer->link_down = link_down;
+    }
+
+    if (s->nc.info->link_status_changed) {
+        s->nc.info->link_status_changed(&s->nc);
+    }
+
+    if (s->nc.peer && s->nc.peer->info->link_status_changed) {
+        s->nc.peer->info->link_status_changed(s->nc.peer);
+    }
+}
+
+static void net_vhost_user_event(void *opaque, int event)
+{
+    VhostUserState *s = opaque;
+
+    switch (event) {
+    case CHR_EVENT_OPENED:
+        vhost_user_start(s);
+        net_vhost_link_down(s, false);
+        error_report("chardev \"%s\" went up\n", s->chr->label);
+        break;
+    case CHR_EVENT_CLOSED:
+        net_vhost_link_down(s, true);
+        vhost_user_stop(s);
+        error_report("chardev \"%s\" went down\n", s->chr->label);
+        break;
+    }
+}
+
+static int net_vhost_user_init(NetClientState *peer, const char *device,
+                               const char *name, CharDriverState *chr,
+                               bool vhostforce)
+{
+    NetClientState *nc;
+    VhostUserState *s;
+
+    nc = qemu_new_net_client(&net_vhost_user_info, peer, device, name);
+
+    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
+             chr->label);
+
+    s = DO_UPCAST(VhostUserState, nc, nc);
+
+    /* We don't provide a receive callback */
+    s->nc.receive_disabled = 1;
+    s->chr = chr;
+    s->vhostforce = vhostforce;
+
+    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event, s);
+
+    return 0;
+}
+
+int net_init_vhost_user(const NetClientOptions *opts, const char *name,
+                   NetClientState *peer)
+{
+    return net_vhost_user_init(peer, "vhost_user", 0, 0, 0);
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 17/20] Add the vhost-user netdev backend to the command line
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (15 preceding siblings ...)
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 16/20] Add new vhost-user netdev backend Antonios Motakis
@ 2014-03-04 18:23 ` Antonios Motakis
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 18/20] Add vhost-user protocol documentation Antonios Motakis
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:23 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Stefan Hajnoczi, mst, Corey Bryant, Michael Tokarev,
	Markus Armbruster, n.nikolaev, Luiz Capitulino, Anthony Liguori,
	Paolo Bonzini, lukego, Antonios Motakis, tech

The supplied chardev id will be inspected for supported options. Only
a socket backend, with a set path (i.e. a Unix socket) and optionally
the server parameter set, will be allowed. Other options (nowait, telnet)
will make the chardev unusable and the netdev will not be initialised.

Additional checks for validity:
  - requires `-mempath ...,share=on`
  - requires `-device virtio-net-*`

The `vhostforce` option is used to force vhost-net when we deal with
non-MSIX guests.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 hmp-commands.hx    |   4 +-
 hw/net/vhost_net.c |   4 ++
 net/hub.c          |   1 +
 net/net.c          |  25 ++++++-----
 net/vhost-user.c   | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json   |  19 +++++++-
 qemu-options.hx    |  17 +++++++
 7 files changed, 181 insertions(+), 16 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index f3fc514..68128c1 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1195,7 +1195,7 @@ ETEXI
     {
         .name       = "host_net_add",
         .args_type  = "device:s,opts:s?",
-        .params     = "tap|user|socket|vde|netmap|dump [options]",
+        .params     = "tap|user|socket|vde|netmap|vhost-user|dump [options]",
         .help       = "add host VLAN client",
         .mhandler.cmd = net_host_device_add,
     },
@@ -1223,7 +1223,7 @@ ETEXI
     {
         .name       = "netdev_add",
         .args_type  = "netdev:O",
-        .params     = "[user|tap|socket|hubport|netmap],id=str[,prop=value][,...]",
+        .params     = "[user|tap|socket|hubport|netmap|vhost-user],id=str[,prop=value][,...]",
         .help       = "add host network device",
         .mhandler.cmd = hmp_netdev_add,
     },
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 48cfda7..1c0ffd5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -15,6 +15,7 @@
 
 #include "net/net.h"
 #include "net/tap.h"
+#include "net/vhost-user.h"
 
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
@@ -325,6 +326,9 @@ VHostNetState *get_vhost_net(NetClientState *nc)
     case NET_CLIENT_OPTIONS_KIND_TAP:
         vhost_net = tap_get_vhost_net(nc);
         break;
+    case NET_CLIENT_OPTIONS_KIND_VHOST_USER:
+        vhost_net = vhost_user_get_vhost_net(nc);
+        break;
     default:
         break;
     }
diff --git a/net/hub.c b/net/hub.c
index 33a99c9..7e0f2d6 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -322,6 +322,7 @@ void net_hub_check_clients(void)
             case NET_CLIENT_OPTIONS_KIND_TAP:
             case NET_CLIENT_OPTIONS_KIND_SOCKET:
             case NET_CLIENT_OPTIONS_KIND_VDE:
+            case NET_CLIENT_OPTIONS_KIND_VHOST_USER:
                 has_host_dev = 1;
                 break;
             default:
diff --git a/net/net.c b/net/net.c
index e3ef1e4..872ede4 100644
--- a/net/net.c
+++ b/net/net.c
@@ -769,23 +769,24 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])(
     const NetClientOptions *opts,
     const char *name,
     NetClientState *peer) = {
-        [NET_CLIENT_OPTIONS_KIND_NIC]       = net_init_nic,
+        [NET_CLIENT_OPTIONS_KIND_NIC]           = net_init_nic,
 #ifdef CONFIG_SLIRP
-        [NET_CLIENT_OPTIONS_KIND_USER]      = net_init_slirp,
+        [NET_CLIENT_OPTIONS_KIND_USER]          = net_init_slirp,
 #endif
-        [NET_CLIENT_OPTIONS_KIND_TAP]       = net_init_tap,
-        [NET_CLIENT_OPTIONS_KIND_SOCKET]    = net_init_socket,
+        [NET_CLIENT_OPTIONS_KIND_TAP]           = net_init_tap,
+        [NET_CLIENT_OPTIONS_KIND_SOCKET]        = net_init_socket,
 #ifdef CONFIG_VDE
-        [NET_CLIENT_OPTIONS_KIND_VDE]       = net_init_vde,
+        [NET_CLIENT_OPTIONS_KIND_VDE]           = net_init_vde,
 #endif
 #ifdef CONFIG_NETMAP
-        [NET_CLIENT_OPTIONS_KIND_NETMAP]    = net_init_netmap,
+        [NET_CLIENT_OPTIONS_KIND_NETMAP]        = net_init_netmap,
 #endif
-        [NET_CLIENT_OPTIONS_KIND_DUMP]      = net_init_dump,
+        [NET_CLIENT_OPTIONS_KIND_DUMP]          = net_init_dump,
 #ifdef CONFIG_NET_BRIDGE
-        [NET_CLIENT_OPTIONS_KIND_BRIDGE]    = net_init_bridge,
+        [NET_CLIENT_OPTIONS_KIND_BRIDGE]        = net_init_bridge,
 #endif
-        [NET_CLIENT_OPTIONS_KIND_HUBPORT]   = net_init_hubport,
+        [NET_CLIENT_OPTIONS_KIND_HUBPORT]       = net_init_hubport,
+        [NET_CLIENT_OPTIONS_KIND_VHOST_USER]    = net_init_vhost_user,
 };
 
 
@@ -819,6 +820,7 @@ static int net_client_init1(const void *object, int is_netdev, Error **errp)
         case NET_CLIENT_OPTIONS_KIND_BRIDGE:
 #endif
         case NET_CLIENT_OPTIONS_KIND_HUBPORT:
+        case NET_CLIENT_OPTIONS_KIND_VHOST_USER:
             break;
 
         default:
@@ -902,11 +904,12 @@ static int net_host_check_device(const char *device)
                                        , "bridge"
 #endif
 #ifdef CONFIG_SLIRP
-                                       ,"user"
+                                       , "user"
 #endif
 #ifdef CONFIG_VDE
-                                       ,"vde"
+                                       , "vde"
 #endif
+                                       , "vhost-user"
     };
     for (i = 0; i < ARRAY_SIZE(valid_param_list); i++) {
         if (!strncmp(valid_param_list[i], device,
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 292be41..600f31f 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -12,6 +12,7 @@
 #include "net/vhost_net.h"
 #include "net/vhost-user.h"
 #include "sysemu/char.h"
+#include "qemu/config-file.h"
 #include "qemu/error-report.h"
 
 typedef struct VhostUserState {
@@ -22,9 +23,17 @@ typedef struct VhostUserState {
     unsigned long long features;
 } VhostUserState;
 
+typedef struct VhostUserChardevProps {
+    bool is_socket;
+    bool is_unix;
+    bool is_server;
+    bool has_unsupported;
+} VhostUserChardevProps;
+
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 {
     VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
     return s->vhost_net;
 }
 
@@ -79,7 +88,7 @@ static void vhost_user_cleanup(NetClientState *nc)
 }
 
 static NetClientInfo net_vhost_user_info = {
-        .type = 0,
+        .type = NET_CLIENT_OPTIONS_KIND_VHOST_USER,
         .size = sizeof(VhostUserState),
         .cleanup = vhost_user_cleanup,
 };
@@ -143,8 +152,122 @@ static int net_vhost_user_init(NetClientState *peer, const char *device,
     return 0;
 }
 
+static int net_vhost_chardev_opts(const char *name, const char *value,
+        void *opaque)
+{
+    VhostUserChardevProps *props = opaque;
+
+    if (strcmp(name, "backend") == 0 && strcmp(value, "socket") == 0) {
+        props->is_socket = 1;
+    } else if (strcmp(name, "path") == 0) {
+        props->is_unix = 1;
+    } else if (strcmp(name, "server") == 0) {
+        props->is_server = 1;
+    } else {
+        error_report("vhost-user does not support a chardev"
+                     " with the following option:\n %s = %s",
+                     name, value);
+        props->has_unsupported = 1;
+        return -1;
+    }
+    return 0;
+}
+
+static CharDriverState *net_vhost_parse_chardev(
+        const NetdevVhostUserOptions *opts)
+{
+    CharDriverState *chr = qemu_chr_find(opts->chardev);
+    VhostUserChardevProps props;
+
+    if (chr == NULL) {
+        error_report("chardev \"%s\" not found\n", opts->chardev);
+        return 0;
+    }
+
+    /* inspect chardev opts */
+    memset(&props, 0, sizeof(props));
+    qemu_opt_foreach(chr->opts, net_vhost_chardev_opts, &props, false);
+
+    if (!props.is_socket || !props.is_unix) {
+        error_report("chardev \"%s\" is not a unix socket\n",
+                     opts->chardev);
+        return 0;
+    }
+
+    if (props.has_unsupported) {
+        error_report("chardev \"%s\" has an unsupported option\n",
+                opts->chardev);
+        return 0;
+    }
+
+    qemu_chr_fe_claim_no_fail(chr);
+
+    return chr;
+}
+
+static int net_vhost_check_net(QemuOpts *opts, void *opaque)
+{
+    const char *name = opaque;
+    const char *driver, *netdev;
+    const char virtio_name[] = "virtio-net-";
+
+    driver = qemu_opt_get(opts, "driver");
+    netdev = qemu_opt_get(opts, "netdev");
+
+    if (!driver || !netdev) {
+        return 0;
+    }
+
+    if ((strcmp(netdev, name) == 0)
+            && (strncmp(driver, virtio_name, strlen(virtio_name)) != 0)) {
+        error_report("vhost-user requires frontend driver virtio-net-*");
+        return -1;
+    }
+
+    return 0;
+}
+
 int net_init_vhost_user(const NetClientOptions *opts, const char *name,
                    NetClientState *peer)
 {
-    return net_vhost_user_init(peer, "vhost_user", 0, 0, 0);
+    const NetdevVhostUserOptions *vhost_user_opts;
+    CharDriverState *chr;
+    bool vhostforce;
+    QemuOpts *mem_opts;
+    unsigned int mem_share = 0;
+
+    assert(opts->kind == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
+    vhost_user_opts = opts->vhost_user;
+
+    chr = net_vhost_parse_chardev(vhost_user_opts);
+    if (!chr) {
+        error_report("No suitable chardev found\n");
+        return -1;
+    }
+
+    /* verify mem-path is set and shared */
+    mem_opts = qemu_opts_find(qemu_find_opts("mem-path"), NULL);
+    if (mem_opts) {
+        mem_share = qemu_opt_get_bool(mem_opts, "share", 0);
+    }
+
+    if (!mem_share) {
+        error_report("vhost-user requires -mem-path /path,share=on");
+        return -1;
+    }
+
+    /* verify net frontend */
+    if (qemu_opts_foreach(qemu_find_opts("device"), net_vhost_check_net,
+                          (void *)name, true) == -1) {
+        return -1;
+    }
+
+    /* vhostforce for non-MSIX */
+    if (vhost_user_opts->has_vhostforce) {
+        vhostforce = vhost_user_opts->vhostforce;
+    } else {
+        vhostforce = false;
+    }
+
+    return net_vhost_user_init(peer, "vhost_user", name, chr, vhostforce);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index ac8ad24..960a853 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3183,6 +3183,22 @@
     '*devname':    'str' } }
 
 ##
+# @NetdevVhostUserOptions
+#
+# Vhost-user network backend
+#
+# @path: control socket path
+#
+# @vhostforce: #optional vhost on for non-MSIX virtio guests (default: false).
+#
+# Since 2.0
+##
+{ 'type': 'NetdevVhostUserOptions',
+  'data': {
+    'chardev':        'str',
+    '*vhostforce':    'bool' } }
+
+##
 # @NetClientOptions
 #
 # A discriminated record of network device traits.
@@ -3200,7 +3216,8 @@
     'dump':     'NetdevDumpOptions',
     'bridge':   'NetdevBridgeOptions',
     'hubport':  'NetdevHubPortOptions',
-    'netmap':   'NetdevNetmapOptions' } }
+    'netmap':   'NetdevNetmapOptions',
+    'vhost-user': 'NetdevVhostUserOptions' } }
 
 ##
 # @NetLegacy
diff --git a/qemu-options.hx b/qemu-options.hx
index f9f42a0..42a51ae 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1434,6 +1434,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_NETMAP
     "netmap|"
 #endif
+    "vhost-user|"
     "socket|"
     "hubport],id=str[,option][,option][,...]\n", QEMU_ARCH_ALL)
 STEXI
@@ -1765,6 +1766,22 @@ The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
 netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
 required hub automatically.
 
+@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
+
+Establish a vhost-user netdev, backed by a chardev @var{id}. The chardev should
+be a unix domain socket backed one. The vhost-user uses a specifically defined
+protocol to pass vhost ioctl replacement messages to an application on the other
+end of the socket. On non-MSIX guests, the feature can be forced with
+@var{vhostforce}.
+
+Example:
+@example
+qemu -m 1024 -mem-path /hugetlbfs,prealloc=on,share=on \
+     -chardev socket,path=/path/to/socket \
+     -netdev type=vhost-user,id=net0,chardev=chr0 \
+     -device virtio-net-pci,netdev=net0
+@end example
+
 @item -net dump[,vlan=@var{n}][,file=@var{file}][,len=@var{len}]
 Dump network traffic on VLAN @var{n} to file @var{file} (@file{qemu-vlan0.pcap} by default).
 At most @var{len} bytes (64k by default) per packet are stored. The file format is
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 18/20] Add vhost-user protocol documentation
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (16 preceding siblings ...)
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 17/20] Add the vhost-user netdev backend to the command line Antonios Motakis
@ 2014-03-04 18:23 ` Antonios Motakis
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 19/20] libqemustub: add stubs to be able to use qemu-char.c Antonios Motakis
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:23 UTC (permalink / raw)
  To: qemu-devel, snabb-devel; +Cc: lukego, Antonios Motakis, tech, n.nikolaev, mst

This document describes the basic message format used by vhost-user
for communication over a unix domain socket. The protocol is based
on the existing ioctl interface used for the kernel version of vhost.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 docs/specs/vhost-user.txt | 261 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 261 insertions(+)
 create mode 100644 docs/specs/vhost-user.txt

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
new file mode 100644
index 0000000..5e1fe58
--- /dev/null
+++ b/docs/specs/vhost-user.txt
@@ -0,0 +1,261 @@
+Vhost-user Protocol
+===================
+
+This protocol is aiming to complement the ioctl interface used to control the
+vhost implementation in the Linux kernel. It implements the control plane needed
+to establish virtqueue sharing with a user space process on the same host. It
+uses communication over a Unix domain socket to share file descriptors in the
+ancillary data of the message.
+
+The protocol defines 2 sides of the communication, master and slave. Master is
+the application that shares it's virtqueues, in our case QEMU. Slave is the
+consumer of the virtqueues.
+
+In the current implementation QEMU is the Master, and the Slave is intended to
+be a software ethernet switch running in user space, such as Snabbswitch.
+
+Master and slave can be either a client (i.e. connecting) or server (listening)
+in the socket communication.
+
+Message Specification
+---------------------
+
+Note that all numbers are in the machine native byte order. A vhost-user message
+consists of 3 header fields and a payload:
+
+------------------------------------
+| request | flags | size | payload |
+------------------------------------
+
+ * Request: 32-bit type of the request
+ * Flags: 32-bit bit field:
+   - Lower 2 bits are the version (currently 0x01)
+   - Bit 2 is the reply flag - needs to be sent on each reply from the slave
+ * Size - 32-bit size of the payload
+
+
+Depending on the request type, payload can be:
+
+ * A single 64-bit integer
+   -------
+   | u64 |
+   -------
+
+   u64: a 64-bit unsigned integer
+
+ * A vring state description
+   ---------------
+  | index | num |
+  ---------------
+
+   Index: a 32-bit index
+   Num: a 32-bit number
+
+ * A vring address description
+   --------------------------------------------------------------
+   | index | flags | size | descriptor | used | available | log |
+   --------------------------------------------------------------
+
+   Index: a 32-bit vring index
+   Flags: a 32-bit vring flags
+   Descriptor: a 64-bit user address of the vring descriptor table
+   Used: a 64-bit user address of the vring used ring
+   Available: a 64-bit user address of the vring available ring
+   Log: a 64-bit guest address for logging
+
+ * Memory regions description
+   ---------------------------------------------------
+   | num regions | padding | region0 | ... | region7 |
+   ---------------------------------------------------
+
+   Num regions: a 32-bit number of regions
+   Padding: 32-bit
+
+   A region is:
+   ---------------------------------------
+   | guest address | size | user address |
+   ---------------------------------------
+
+   Guest address: a 64-bit guest address of the region
+   Size: a 64-bit size
+   User address: a 64-bit user address
+
+
+In QEMU the vhost-user message is implemented with the following struct:
+
+typedef struct VhostUserMsg {
+    VhostUserRequest request;
+    uint32_t flags;
+    uint32_t size;
+    union {
+        uint64_t u64;
+        struct vhost_vring_state state;
+        struct vhost_vring_addr addr;
+        VhostUserMemory memory;
+    };
+} QEMU_PACKED VhostUserMsg;
+
+Communication
+-------------
+
+The protocol for vhost-user is based on the existing implementation of vhost
+for the Linux Kernel. Most messages that can be send via the Unix domain socket
+implementing vhost-user have an equivalent ioctl to the kernel implementation.
+
+The communication consists of master sending message requests and slave sending
+message replies. Most of the requests don't require replies. Here is a list of
+the ones that do:
+
+ * VHOST_GET_FEATURES
+ * VHOST_GET_VRING_BASE
+
+There are several messages that the master sends with file descriptors passed
+in the ancillary data:
+
+ * VHOST_SET_MEM_TABLE
+ * VHOST_SET_LOG_FD
+ * VHOST_SET_VRING_KICK
+ * VHOST_SET_VRING_CALL
+ * VHOST_SET_VRING_ERR
+
+If Master is unable to send the full message or receives a wrong reply it will
+close the connection. An optional reconnection mechanism can be implemented.
+
+Message types
+-------------
+
+ * VHOST_USER_GET_FEATURES
+
+      Id: 2
+      Equivalent ioctl: VHOST_GET_FEATURES
+      Master payload: N/A
+      Slave payload: u64
+
+      Get from the underlying vhost implementation the features bitmask.
+
+ * VHOST_USER_SET_FEATURES
+
+      Id: 3
+      Ioctl: VHOST_SET_FEATURES
+      Master payload: u64
+
+      Enable features in the underlying vhost implementation using a bitmask.
+
+ * VHOST_USER_SET_OWNER
+
+      Id: 4
+      Equivalent ioctl: VHOST_SET_OWNER
+      Master payload: N/A
+
+      Issued when a new connection is established. It sets the current Master
+      as an owner of the session. This can be used on the Slave as a
+      "session start" flag.
+
+ * VHOST_USER_RESET_OWNER
+
+      Id: 5
+      Equivalent ioctl: VHOST_RESET_OWNER
+      Master payload: N/A
+
+      Issued when a new connection is about to be closed. The Master will no
+      longer own this connection (and will usually close it).
+
+ * VHOST_USER_SET_MEM_TABLE
+
+      Id: 6
+      Equivalent ioctl: VHOST_SET_MEM_TABLE
+      Master payload: memory regions description
+
+      Sets the memory map regions on the slave so it can translate the vring
+      addresses. In the ancillary data there is an array of file descriptors
+      for each memory mapped region. The size and ordering of the fds matches
+      the number and ordering of memory regions.
+
+ * VHOST_USER_SET_LOG_BASE
+
+      Id: 7
+      Equivalent ioctl: VHOST_SET_LOG_BASE
+      Master payload: u64
+
+      Sets the logging base address.
+
+ * VHOST_USER_SET_LOG_FD
+
+      Id: 8
+      Equivalent ioctl: VHOST_SET_LOG_FD
+      Master payload: N/A
+
+      Sets the logging file descriptor, which is passed as ancillary data.
+
+ * VHOST_USER_SET_VRING_NUM
+
+      Id: 9
+      Equivalent ioctl: VHOST_SET_VRING_NUM
+      Master payload: vring state description
+
+      Sets the number of vrings for this owner.
+
+ * VHOST_USER_SET_VRING_ADDR
+
+      Id: 10
+      Equivalent ioctl: VHOST_SET_VRING_ADDR
+      Master payload: vring address description
+      Slave payload: N/A
+
+      Sets the addresses of the different aspects of the vring.
+
+ * VHOST_USER_SET_VRING_BASE
+
+      Id: 11
+      Equivalent ioctl: VHOST_SET_VRING_BASE
+      Master payload: vring state description
+
+      Sets the base address where the available descriptors are.
+
+ * VHOST_USER_GET_VRING_BASE
+
+      Id: 12
+      Equivalent ioctl: VHOST_USER_GET_VRING_BASE
+      Master payload: vring state description
+      Slave payload: vring state description
+
+      Get the vring base address.
+
+ * VHOST_USER_SET_VRING_KICK
+
+      Id: 13
+      Equivalent ioctl: VHOST_SET_VRING_KICK
+      Master payload: u64
+
+      Set the event file descriptor for adding buffers to the vring. It
+      is passed in the ancillary data.
+      Bits (0-7) of the payload contain the vring index. Bit 8 is the
+      invalid FD flag. This flag is set when there is no file descriptor
+      in the ancillary data. This signals that polling should be used
+      instead of waiting for a kick.
+
+ * VHOST_USER_SET_VRING_CALL
+
+      Id: 14
+      Equivalent ioctl: VHOST_SET_VRING_CALL
+      Master payload: u64
+
+      Set the event file descriptor to signal when buffers are used. It
+      is passed in the ancillary data.
+      Bits (0-7) of the payload contain the vring index. Bit 8 is the
+      invalid FD flag. This flag is set when there is no file descriptor
+      in the ancillary data. This signals that polling will be used
+      instead of waiting for the call.
+
+ * VHOST_USER_SET_VRING_ERR
+
+      Id: 15
+      Equivalent ioctl: VHOST_SET_VRING_ERR
+      Master payload: u64
+
+      Set the event file descriptor to signal when error occurs. It
+      is passed in the ancillary data.
+      Bits (0-7) of the payload contain the vring index. Bit 8 is the
+      invalid FD flag. This flag is set when there is no file descriptor
+      in the ancillary data.
+
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 19/20] libqemustub: add stubs to be able to use qemu-char.c
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (17 preceding siblings ...)
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 18/20] Add vhost-user protocol documentation Antonios Motakis
@ 2014-03-04 18:23 ` Antonios Motakis
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user Antonios Motakis
  2014-03-04 18:29 ` [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Paolo Bonzini
  20 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:23 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Peter Maydell, mst, n.nikolaev, Igor Mammedov, Paolo Bonzini,
	lukego, Antonios Motakis, tech, Andreas Färber

chardev depends on lots of external symbols that are not necessarily
needed to be able to use, for example, 'socket chardev'. So add stubs
for these functions:

 - bdrv_commit_all
 - qemu_chr_open_msmouse
 - is_daemonized
 - qemu_add_machine_init_done_notifier
 - monitor_init
 - qemu_notify_event
 - vc_init

and this array:

 - serial_hds

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 stubs/Makefile.objs       | 8 ++++++++
 stubs/bdrv-commit-all.c   | 7 +++++++
 stubs/chr-msmouse.c       | 7 +++++++
 stubs/get-next-serial.c   | 3 +++
 stubs/is-daemonized.c     | 7 +++++++
 stubs/machine-init-done.c | 6 ++++++
 stubs/monitor-init.c      | 6 ++++++
 stubs/notify-event.c      | 6 ++++++
 stubs/vc-init.c           | 7 +++++++
 9 files changed, 57 insertions(+)
 create mode 100644 stubs/bdrv-commit-all.c
 create mode 100644 stubs/chr-msmouse.c
 create mode 100644 stubs/get-next-serial.c
 create mode 100644 stubs/is-daemonized.c
 create mode 100644 stubs/machine-init-done.c
 create mode 100644 stubs/monitor-init.c
 create mode 100644 stubs/notify-event.c
 create mode 100644 stubs/vc-init.c

diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index df3aa7a..f213605 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -1,4 +1,6 @@
 stub-obj-y += arch-query-cpu-def.o
+stub-obj-y += bdrv-commit-all.o
+stub-obj-y += chr-msmouse.o
 stub-obj-y += clock-warp.o
 stub-obj-y += cpu-get-clock.o
 stub-obj-y += cpu-get-icount.o
@@ -9,20 +11,26 @@ stub-obj-y += fdset-get-fd.o
 stub-obj-y += fdset-remove-fd.o
 stub-obj-y += gdbstub.o
 stub-obj-y += get-fd.o
+stub-obj-y += get-next-serial.o
 stub-obj-y += get-vm-name.o
 stub-obj-y += iothread-lock.o
+stub-obj-y += is-daemonized.o
+stub-obj-y += machine-init-done.o
 stub-obj-y += migr-blocker.o
 stub-obj-y += mon-is-qmp.o
 stub-obj-y += mon-printf.o
 stub-obj-y += mon-print-filename.o
 stub-obj-y += mon-protocol-event.o
 stub-obj-y += mon-set-error.o
+stub-obj-y += monitor-init.o
+stub-obj-y += notify-event.o
 stub-obj-y += pci-drive-hot-add.o
 stub-obj-y += reset.o
 stub-obj-y += set-fd-handler.o
 stub-obj-y += slirp.o
 stub-obj-y += sysbus.o
 stub-obj-y += uuid.o
+stub-obj-y += vc-init.o
 stub-obj-y += vm-stop.o
 stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
diff --git a/stubs/bdrv-commit-all.c b/stubs/bdrv-commit-all.c
new file mode 100644
index 0000000..a8e0a95
--- /dev/null
+++ b/stubs/bdrv-commit-all.c
@@ -0,0 +1,7 @@
+#include "qemu-common.h"
+#include "block/block.h"
+
+int bdrv_commit_all(void)
+{
+    return 0;
+}
diff --git a/stubs/chr-msmouse.c b/stubs/chr-msmouse.c
new file mode 100644
index 0000000..812f8b0
--- /dev/null
+++ b/stubs/chr-msmouse.c
@@ -0,0 +1,7 @@
+#include "qemu-common.h"
+#include "sysemu/char.h"
+
+CharDriverState *qemu_chr_open_msmouse(void)
+{
+    return 0;
+}
diff --git a/stubs/get-next-serial.c b/stubs/get-next-serial.c
new file mode 100644
index 0000000..40c56d1
--- /dev/null
+++ b/stubs/get-next-serial.c
@@ -0,0 +1,3 @@
+#include "qemu-common.h"
+
+CharDriverState *serial_hds[0];
diff --git a/stubs/is-daemonized.c b/stubs/is-daemonized.c
new file mode 100644
index 0000000..16ce7c7
--- /dev/null
+++ b/stubs/is-daemonized.c
@@ -0,0 +1,7 @@
+#include "qemu-common.h"
+#include "sysemu/os-posix.h"
+
+bool is_daemonized(void)
+{
+    return true;
+}
diff --git a/stubs/machine-init-done.c b/stubs/machine-init-done.c
new file mode 100644
index 0000000..28a9255
--- /dev/null
+++ b/stubs/machine-init-done.c
@@ -0,0 +1,6 @@
+#include "qemu-common.h"
+#include "sysemu/sysemu.h"
+
+void qemu_add_machine_init_done_notifier(Notifier *notify)
+{
+}
diff --git a/stubs/monitor-init.c b/stubs/monitor-init.c
new file mode 100644
index 0000000..563902b
--- /dev/null
+++ b/stubs/monitor-init.c
@@ -0,0 +1,6 @@
+#include "qemu-common.h"
+#include "monitor/monitor.h"
+
+void monitor_init(CharDriverState *chr, int flags)
+{
+}
diff --git a/stubs/notify-event.c b/stubs/notify-event.c
new file mode 100644
index 0000000..32f7289
--- /dev/null
+++ b/stubs/notify-event.c
@@ -0,0 +1,6 @@
+#include "qemu-common.h"
+#include "qemu/main-loop.h"
+
+void qemu_notify_event(void)
+{
+}
diff --git a/stubs/vc-init.c b/stubs/vc-init.c
new file mode 100644
index 0000000..2af054f
--- /dev/null
+++ b/stubs/vc-init.c
@@ -0,0 +1,7 @@
+#include "qemu-common.h"
+#include "ui/console.h"
+
+CharDriverState *vc_init(ChardevVC *vc)
+{
+    return 0;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (18 preceding siblings ...)
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 19/20] libqemustub: add stubs to be able to use qemu-char.c Antonios Motakis
@ 2014-03-04 18:23 ` Antonios Motakis
  2014-03-04 18:39   ` Andreas Färber
  2014-03-04 18:29 ` [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Paolo Bonzini
  20 siblings, 1 reply; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:23 UTC (permalink / raw)
  To: qemu-devel, snabb-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, mst, Markus Armbruster, n.nikolaev,
	Anthony Liguori, lukego, Antonios Motakis, tech,
	Andreas Färber

This test creates a 'server' chardev to listen for vhost-user messages.
Once VHOST_USER_SET_MEM_TABLE is received it mmaps each received region,
and read 1k bytes from it. The read data is compared to data from readl.

The test requires hugetlbfs to be already mounted and writable. The mount
point defaults to '/hugetlbfs' and can be specified via the environment
variable QTEST_HUGETLBFS_PATH.

The rom pc-bios/pxe-virtio.rom is used to instantiate a virtio pcicontroller.

Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
---
 tests/Makefile          |   4 +
 tests/vhost-user-test.c | 309 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 313 insertions(+)
 create mode 100644 tests/vhost-user-test.c

diff --git a/tests/Makefile b/tests/Makefile
index b17d41e..85bcae5 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -110,6 +110,7 @@ check-qtest-i386-y += tests/vmxnet3-test$(EXESUF)
 gcov-files-i386-y += hw/net/vmxnet3.c
 gcov-files-i386-y += hw/net/vmxnet_rx_pkt.c
 gcov-files-i386-y += hw/net/vmxnet_tx_pkt.c
+check-qtest-i386-y += tests/vhost-user-test$(EXESUF)
 check-qtest-x86_64-y = $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
@@ -241,8 +242,11 @@ tests/ipoctal232-test$(EXESUF): tests/ipoctal232-test.o
 tests/qom-test$(EXESUF): tests/qom-test.o
 tests/blockdev-test$(EXESUF): tests/blockdev-test.o $(libqos-pc-obj-y)
 tests/qdev-monitor-test$(EXESUF): tests/qdev-monitor-test.o $(libqos-pc-obj-y)
+tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o libqemuutil.a libqemustub.a
 tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o
 
+LIBS+= -lutil
+
 # QTest rules
 
 TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS)))
diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
new file mode 100644
index 0000000..a7160f8
--- /dev/null
+++ b/tests/vhost-user-test.c
@@ -0,0 +1,309 @@
+/*
+ * QTest testcase for the vhost-user
+ *
+ * Copyright (c) 2014 Virtual Open Systems Sarl.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "libqtest.h"
+#include "qemu/option.h"
+#include "sysemu/char.h"
+#include "sysemu/sysemu.h"
+
+#include <glib.h>
+#include <linux/vhost.h>
+#include <sys/mman.h>
+#include <sys/vfs.h>
+#include <qemu/sockets.h>
+
+#define QEMU_CMD_ACCEL  " -machine accel=tcg"
+#define QEMU_CMD_MEM    " -mem-path %s,share=on"
+#define QEMU_CMD_CHR    " -chardev socket,id=chr0,path=%s"
+#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=chr0,vhostforce"
+#define QEMU_CMD_NET    " -device virtio-net-pci,netdev=net0 "
+#define QEMU_CMD_ROM    " -option-rom ../pc-bios/pxe-virtio.rom"
+
+#define QEMU_CMD        QEMU_CMD_ACCEL QEMU_CMD_MEM QEMU_CMD_CHR \
+                        QEMU_CMD_NETDEV QEMU_CMD_NET QEMU_CMD_ROM
+
+#define HUGETLBFS_MAGIC       0x958458f6
+
+/*********** FROM hw/virtio/vhost-user.c *************************************/
+
+#define VHOST_MEMORY_MAX_NREGIONS    8
+
+typedef enum VhostUserRequest {
+    VHOST_USER_NONE = 0,
+    VHOST_USER_GET_FEATURES = 1,
+    VHOST_USER_SET_FEATURES = 2,
+    VHOST_USER_SET_OWNER = 3,
+    VHOST_USER_RESET_OWNER = 4,
+    VHOST_USER_SET_MEM_TABLE = 5,
+    VHOST_USER_SET_LOG_BASE = 6,
+    VHOST_USER_SET_LOG_FD = 7,
+    VHOST_USER_SET_VRING_NUM = 8,
+    VHOST_USER_SET_VRING_ADDR = 9,
+    VHOST_USER_SET_VRING_BASE = 10,
+    VHOST_USER_GET_VRING_BASE = 11,
+    VHOST_USER_SET_VRING_KICK = 12,
+    VHOST_USER_SET_VRING_CALL = 13,
+    VHOST_USER_SET_VRING_ERR = 14,
+    VHOST_USER_MAX
+} VhostUserRequest;
+
+typedef struct VhostUserMemoryRegion {
+    uint64_t guest_phys_addr;
+    uint64_t memory_size;
+    uint64_t userspace_addr;
+} VhostUserMemoryRegion;
+
+typedef struct VhostUserMemory {
+    uint32_t nregions;
+    uint32_t padding;
+    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
+} VhostUserMemory;
+
+typedef struct VhostUserMsg {
+    VhostUserRequest request;
+
+#define VHOST_USER_VERSION_MASK     (0x3)
+#define VHOST_USER_REPLY_MASK       (0x1<<2)
+    uint32_t flags;
+    uint32_t size; /* the following payload size */
+    union {
+        uint64_t u64;
+        struct vhost_vring_state state;
+        struct vhost_vring_addr addr;
+        VhostUserMemory memory;
+    };
+} QEMU_PACKED VhostUserMsg;
+
+static VhostUserMsg m __attribute__ ((unused));
+#define VHOST_USER_HDR_SIZE (sizeof(m.request) \
+                            + sizeof(m.flags) \
+                            + sizeof(m.size))
+
+#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE)
+
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION    (0x1)
+/*****************************************************************************/
+
+int fds_num = 0, fds[VHOST_MEMORY_MAX_NREGIONS];
+static VhostUserMemory memory;
+static GMutex data_mutex;
+static GCond data_cond;
+
+static void read_guest_mem(void)
+{
+    uint32_t *guest_mem;
+    gint64 end_time;
+    int i, j;
+
+    g_mutex_lock(&data_mutex);
+
+    end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
+    while (!fds_num) {
+        if (!g_cond_wait_until(&data_cond, &data_mutex, end_time)) {
+            /* timeout has passed */
+            g_assert(fds_num);
+            break;
+        }
+    }
+
+    /* check for sanity */
+    g_assert_cmpint(fds_num, >, 0);
+    g_assert_cmpint(fds_num, ==, memory.nregions);
+
+    /* iterate all regions */
+    for (i = 0; i < fds_num; i++) {
+
+        /* We'll check only he region statring at 0x0, suppose it is */
+        if (memory.regions[i].guest_phys_addr != 0x0) {
+            continue;
+        }
+
+        g_assert_cmpint(memory.regions[i].memory_size, >, 1024);
+
+        guest_mem = mmap(0, memory.regions[i].memory_size,
+        PROT_READ | PROT_WRITE, MAP_SHARED, fds[i], 0);
+
+        for (j = 0; j < 256; j++) {
+            uint32_t a = readl(memory.regions[i].guest_phys_addr + j*4);
+            uint32_t b = guest_mem[j];
+
+            g_assert_cmpint(a, ==, b);
+        }
+
+        munmap(guest_mem, memory.regions[i].memory_size);
+    }
+
+    g_assert_cmpint(1, ==, 1);
+    g_mutex_unlock(&data_mutex);
+}
+
+static void *thread_function(void *data)
+{
+    GMainLoop *loop;
+    loop = g_main_loop_new(NULL, FALSE);
+    g_main_loop_run(loop);
+    return NULL;
+}
+
+static int chr_can_read(void *opaque)
+{
+    return VHOST_USER_HDR_SIZE;
+}
+
+static void chr_read(void *opaque, const uint8_t *buf, int size)
+{
+    CharDriverState *chr = opaque;
+    VhostUserMsg msg;
+    uint8_t *p = (uint8_t *) &msg;
+    int fd;
+
+    if (size != VHOST_USER_HDR_SIZE) {
+        g_test_message("Wrong message size received %d\n", size);
+        return;
+    }
+
+    memcpy(p, buf, VHOST_USER_HDR_SIZE);
+
+    if (msg.size) {
+        p += VHOST_USER_HDR_SIZE;
+        qemu_chr_fe_read_all(chr, p, msg.size);
+    }
+
+    switch (msg.request) {
+    case VHOST_USER_GET_FEATURES:
+        /* send back features to qemu */
+        msg.flags |= VHOST_USER_REPLY_MASK;
+        msg.size = sizeof(m.u64);
+        msg.u64 = 0;
+        p = (uint8_t *) &msg;
+        qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+        break;
+
+    case VHOST_USER_GET_VRING_BASE:
+        /* send back vring base to qemu */
+        msg.flags |= VHOST_USER_REPLY_MASK;
+        msg.size = sizeof(m.state);
+        msg.state.num = 0;
+        p = (uint8_t *) &msg;
+        qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+        break;
+
+    case VHOST_USER_SET_MEM_TABLE:
+        /* received the mem table */
+        memcpy(&memory, &msg.memory, sizeof(msg.memory));
+        fds_num = qemu_chr_fe_get_msgfds(chr, fds, sizeof(fds) / sizeof(int));
+
+        /* signal the test that it can continue */
+        g_cond_signal(&data_cond);
+        g_mutex_unlock(&data_mutex);
+        break;
+
+    case VHOST_USER_SET_VRING_KICK:
+    case VHOST_USER_SET_VRING_CALL:
+        /* consume the fd */
+        qemu_chr_fe_get_msgfds(chr, &fd, 1);
+        /*
+         * This is a non-blocking eventfd.
+         * The receive function forces it to be blocking,
+         * so revert it back to non-blocking.
+         */
+        qemu_set_nonblock(fd);
+        break;
+    default:
+        break;
+    }
+}
+
+static const char *init_hugepagefs(void)
+{
+    const char *path;
+    struct statfs fs;
+    int ret;
+
+    path = getenv("QTEST_HUGETLBFS_PATH");
+    if (!path) {
+        path = "/hugetlbfs";
+    }
+
+    if (access(path, R_OK | W_OK | X_OK)) {
+        g_test_message("access on path (%s): %s\n", path, strerror(errno));
+        return 0;
+    }
+
+    do {
+        ret = statfs(path, &fs);
+    } while (ret != 0 && errno == EINTR);
+
+    if (ret != 0) {
+        g_test_message("statfs on path (%s): %s\n", path, strerror(errno));
+        return 0;
+    }
+
+    if (fs.f_type != HUGETLBFS_MAGIC) {
+        g_test_message("Warning: path not on HugeTLBFS: %s\n", path);
+        return 0;
+    }
+
+    return path;
+}
+
+int main(int argc, char **argv)
+{
+    QTestState *s = NULL;
+    CharDriverState *chr = NULL;
+    const char *hugefs = 0;
+    char *socket_path = 0;
+    char *qemu_cmd = 0;
+    char *chr_path = 0;
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+
+    module_call_init(MODULE_INIT_QOM);
+
+    hugefs = init_hugepagefs();
+    g_assert(hugefs);
+
+    socket_path = g_strdup_printf("/tmp/vhost-%d.sock", getpid());
+
+    /* create char dev and add read handlers */
+    qemu_add_opts(&qemu_chardev_opts);
+    chr_path = g_strdup_printf("unix:%s,server,nowait", socket_path);
+    chr = qemu_chr_new("chr0", chr_path, NULL);
+    g_free(chr_path);
+    qemu_chr_add_handlers(chr, chr_can_read, chr_read, NULL, chr);
+
+    /* run the main loop thread so the chardev may operate */
+    g_mutex_init(&data_mutex);
+    g_cond_init(&data_cond);
+    g_mutex_lock(&data_mutex);
+    g_thread_new(NULL, thread_function, NULL);
+
+    qemu_cmd = g_strdup_printf(QEMU_CMD, hugefs, socket_path);
+    s = qtest_start(qemu_cmd);
+    g_free(qemu_cmd);
+
+    qtest_add_func("/vhost-user/read-guest-mem", read_guest_mem);
+
+    ret = g_test_run();
+
+    if (s) {
+        qtest_quit(s);
+    }
+
+    /* cleanup */
+    unlink(socket_path);
+    g_free(socket_path);
+    g_cond_clear(&data_cond);
+    g_mutex_clear(&data_mutex);
+
+    return ret;
+}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends
  2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
                   ` (19 preceding siblings ...)
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user Antonios Motakis
@ 2014-03-04 18:29 ` Paolo Bonzini
  2014-03-04 18:33   ` Antonios Motakis
  2014-05-20 11:22   ` Nikolay Nikolaev
  20 siblings, 2 replies; 33+ messages in thread
From: Paolo Bonzini @ 2014-03-04 18:29 UTC (permalink / raw)
  To: snabb-devel, qemu-devel; +Cc: lukego, Antonios Motakis, tech, n.nikolaev, mst

Il 04/03/2014 19:22, Antonios Motakis ha scritto:
> In this patch series we would like to introduce our approach for putting a
> virtio-net backend in an external userspace process. Our eventual target is to
> run the network backend in the Snabbswitch ethernet switch, while receiving
> traffic from a guest inside QEMU/KVM which runs an unmodified virtio-net
> implementation.
>
> For this, we are working into extending vhost to allow equivalent functionality
> for userspace. Vhost already passes control of the data plane of virtio-net to
> the host kernel; we want to realize a similar model, but for userspace.
>
> In this patch series the concept of a vhost-backend is introduced.
>
> We define two vhost backend types - vhost-kernel and vhost-user. The former is
> the interface to the current kernel module implementation. Its control plane is
> ioctl based. The data plane is realized by the kernel directly accessing the
> QEMU allocated, guest memory.
>
> In the new vhost-user backend, the control plane is based on communication
> between QEMU and another userspace process using a unix domain socket. This
> allows to implement a virtio backend for a guest running in QEMU, inside the
> other userspace process. For this communication we use a chardev with a Unix
> domain socket backend. Vhost-user is client/server agnostic regarding the
> chardev, however it does not support the 'nowait' and 'telnet' options.
>
> A reconnection in 'server' mode is supported, but the backend's exposed virtio
> features need to be compatible with the first connected slave.
>
> We change -mem-path to QemuOpts and add prealloc and share as properties
> to it. HugeTLBFS is required for this option to work.
>
> The data path is realized by directly accessing the vrings and the buffer data
> off the guest's memory.
>
> The current user of vhost-user is only vhost-net. We add new netdev backend
> that is intended to initialize vhost-net with vhost-user backend.
>
> Example usage:
>
> qemu -m 1024 -mem-path /hugetlbfs,share=on \
>      -chardev socket,id=chr0,path=/path/to/
> socket \
>      -netdev type=vhost-user,id=net0,chardev=chr0 \
>      -device virtio-net-pci,netdev=net0
>
> On non-MSIX guests the vhost feture can be forced using a special option:
>
> ...
>      -netdev type=vhost-user,id=net0,chardev=chr0,vhostforce
> ...
>
> In order to use ioeventfds, kvm should be enabled.
>
> This code can be pulled from git@github.com:virtualopensystems/qemu.git vhost-user-v9
> A simple functional test is available in tests/vhost-user-test.c
>
> A reference vhost-user slave for testing is also available from git@github.com:virtualopensystems/vapp.git

Hi,

did you see the series I posted today?  It would be great if you tested 
vhost-user on top of it.  The replacement for the above -mem-path 
incantation should be the following:

    -object memory-file,id=mem,path=/hugetlbfs,share=on \
    -numa node,memdev=mem

Do the ideas behind vhost-user work if you have multiple hugetlbfs 
files, one per node?

Paolo

> Changes from v8:
>  - Removed prealloc property from the -mem-path refactoring
>  - Added and use new function - kvm_eventfds_enabled
>  - Add virtio_queue_get_avail_idx used in vhost_virtqueue_stop to
>    get a sane value in case of VHOST_GET_VRING_BASE failure
>  - vhost user uses kvm_eventfds_enabled to check whether the ioeventfd
>    capability of KVM is available
>  - Added flag VHOST_USER_VRING_NOFD_MASK to be set when KICK, CALL or ERR file
>    descriptor is invalid or ioeventfd is not available
>
> Changes from v7:
>  - Slave reconnection when using chardev in server mode
>  - qtest vhost-user-test added
>  - New qemu_chr_fe_get_msgfds for reading multiple fds from the chardev
>  - Mandatory features in vhost_dev, used on reconnect to verify for conflicts
>  - Add vhostforce parameter to -netdev vhost-user (for non-MSIX guests)
>  - Extend libqemustub.a to support qemu-char.c
>
> Changes from v6:
>  - Remove the 'unlink' property of '-mem-path'
>  - Extend qemu-char: blocking read, send fds, monitor for connection close
>  - Vhost-user uses chardev as a backend
>  - Poll and reconnect removed (no VHOST_USER_ECHO).
>  - Disconnect is deteced by the chardev (G_IO_HUP event)
>  - vhost-backend.c split to vhost-user.c
>
> Changes from v5:
>  - Split -mem-path unlink option to a separate patch
>  - Fds are passed only in the ancillary data
>  - Stricter message size checks on receive/send
>  - Netdev vhost-user now includes path and poll_time options
>  - The connection probing interval is configurable
>
> Changes from v4:
>  - Use error_report for errors
>  - VhostUserMsg has new field `size` indicating the following payload length.
>    Field `flags` now has version and reply bits. The structure is packed.
>  - Send data is of variable length (`size` field in message)
>  - Receive in 2 steps, header and payload
>  - Add new message type VHOST_USER_ECHO, to check connection status
>
> Changes from v3:
>  - Convert -mem-path to QemuOpts with prealloc, share and unlink properties
>  - Set 1 sec timeout when read/write to the unix domain socket
>  - Fix file descriptor leak
>
> Changes from v2:
>  - Reconnect when the backend disappears
>
> Changes from v1:
>  - Implementation of vhost-user netdev backend
>  - Code improvements
>
> Antonios Motakis (20):
>   Convert -mem-path to QemuOpts and add share property
>   Add kvm_eventfds_enabled function
>   Add chardev API qemu_chr_fe_read_all
>   Add chardev API qemu_chr_fe_set_msgfds
>   Add chardev API qemu_chr_fe_get_msgfds
>   Add G_IO_HUP handler for socket chardev
>   vhost_net should call the poll callback only when it is set
>   Refactor virtio-net to use generic get_vhost_net
>   Add new virtio API virtio_queue_get_avail_idx
>   Gracefully handle ioctl failure in vhost_virtqueue_stop
>   vhost_net_init will use VhostNetOptions to get all its arguments
>   Add vhost_ops to vhost_dev struct and replace all relevant ioctls
>   Add mandatory_features to vhost_dev
>   Add vhost-backend and VhostBackendType
>   Add vhost-user as a vhost backend.
>   Add new vhost-user netdev backend
>   Add the vhost-user netdev backend to the command line
>   Add vhost-user protocol documentation
>   libqemustub: add stubs to be able to use qemu-char.c
>   Add qtest for vhost-user
>
>  docs/specs/vhost-user.txt         | 261 ++++++++++++++++++++++++++++
>  exec.c                            |  21 ++-
>  hmp-commands.hx                   |   4 +-
>  hw/net/vhost_net.c                | 141 ++++++++++-----
>  hw/net/virtio-net.c               |  29 +---
>  hw/scsi/vhost-scsi.c              |  20 ++-
>  hw/virtio/Makefile.objs           |   2 +-
>  hw/virtio/vhost-backend.c         |  71 ++++++++
>  hw/virtio/vhost-user.c            | 356 ++++++++++++++++++++++++++++++++++++++
>  hw/virtio/vhost.c                 |  58 +++----
>  hw/virtio/virtio.c                |   5 +
>  include/exec/cpu-all.h            |   1 -
>  include/hw/virtio/vhost-backend.h |  38 ++++
>  include/hw/virtio/vhost.h         |   9 +-
>  include/hw/virtio/virtio.h        |   1 +
>  include/net/vhost-user.h          |  17 ++
>  include/net/vhost_net.h           |  13 +-
>  include/sysemu/char.h             |  43 ++++-
>  include/sysemu/kvm.h              |  11 ++
>  kvm-all.c                         |   4 +
>  kvm-stub.c                        |   1 +
>  net/Makefile.objs                 |   2 +-
>  net/clients.h                     |   3 +
>  net/hub.c                         |   1 +
>  net/net.c                         |  25 +--
>  net/tap.c                         |  18 +-
>  net/vhost-user.c                  | 273 +++++++++++++++++++++++++++++
>  qapi-schema.json                  |  19 +-
>  qemu-char.c                       | 272 +++++++++++++++++++++++++----
>  qemu-options.hx                   |  25 ++-
>  stubs/Makefile.objs               |   8 +
>  stubs/bdrv-commit-all.c           |   7 +
>  stubs/chr-msmouse.c               |   7 +
>  stubs/get-next-serial.c           |   3 +
>  stubs/is-daemonized.c             |   7 +
>  stubs/machine-init-done.c         |   6 +
>  stubs/monitor-init.c              |   6 +
>  stubs/notify-event.c              |   6 +
>  stubs/vc-init.c                   |   7 +
>  tests/Makefile                    |   4 +
>  tests/vhost-user-test.c           | 309 +++++++++++++++++++++++++++++++++
>  vl.c                              |  23 ++-
>  42 files changed, 1979 insertions(+), 158 deletions(-)
>  create mode 100644 docs/specs/vhost-user.txt
>  create mode 100644 hw/virtio/vhost-backend.c
>  create mode 100644 hw/virtio/vhost-user.c
>  create mode 100644 include/hw/virtio/vhost-backend.h
>  create mode 100644 include/net/vhost-user.h
>  create mode 100644 net/vhost-user.c
>  create mode 100644 stubs/bdrv-commit-all.c
>  create mode 100644 stubs/chr-msmouse.c
>  create mode 100644 stubs/get-next-serial.c
>  create mode 100644 stubs/is-daemonized.c
>  create mode 100644 stubs/machine-init-done.c
>  create mode 100644 stubs/monitor-init.c
>  create mode 100644 stubs/notify-event.c
>  create mode 100644 stubs/vc-init.c
>  create mode 100644 tests/vhost-user-test.c
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends
  2014-03-04 18:29 ` [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Paolo Bonzini
@ 2014-03-04 18:33   ` Antonios Motakis
  2014-03-04 18:38     ` Paolo Bonzini
  2014-05-20 11:22   ` Nikolay Nikolaev
  1 sibling, 1 reply; 33+ messages in thread
From: Antonios Motakis @ 2014-03-04 18:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: snabb-devel@googlegroups.com, Michael S.Tsirkin,
	qemu-devel qemu-devel, Nikolay Nikolaev, Luke Gorrie,
	VirtualOpenSystems Technical Team

[-- Attachment #1: Type: text/plain, Size: 9789 bytes --]

On Tue, Mar 4, 2014 at 7:29 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 04/03/2014 19:22, Antonios Motakis ha scritto:
>
>  In this patch series we would like to introduce our approach for putting a
>> virtio-net backend in an external userspace process. Our eventual target
>> is to
>> run the network backend in the Snabbswitch ethernet switch, while
>> receiving
>> traffic from a guest inside QEMU/KVM which runs an unmodified virtio-net
>> implementation.
>>
>> For this, we are working into extending vhost to allow equivalent
>> functionality
>> for userspace. Vhost already passes control of the data plane of
>> virtio-net to
>> the host kernel; we want to realize a similar model, but for userspace.
>>
>> In this patch series the concept of a vhost-backend is introduced.
>>
>> We define two vhost backend types - vhost-kernel and vhost-user. The
>> former is
>> the interface to the current kernel module implementation. Its control
>> plane is
>> ioctl based. The data plane is realized by the kernel directly accessing
>> the
>> QEMU allocated, guest memory.
>>
>> In the new vhost-user backend, the control plane is based on communication
>> between QEMU and another userspace process using a unix domain socket.
>> This
>> allows to implement a virtio backend for a guest running in QEMU, inside
>> the
>> other userspace process. For this communication we use a chardev with a
>> Unix
>> domain socket backend. Vhost-user is client/server agnostic regarding the
>> chardev, however it does not support the 'nowait' and 'telnet' options.
>>
>> A reconnection in 'server' mode is supported, but the backend's exposed
>> virtio
>> features need to be compatible with the first connected slave.
>>
>> We change -mem-path to QemuOpts and add prealloc and share as properties
>> to it. HugeTLBFS is required for this option to work.
>>
>> The data path is realized by directly accessing the vrings and the buffer
>> data
>> off the guest's memory.
>>
>> The current user of vhost-user is only vhost-net. We add new netdev
>> backend
>> that is intended to initialize vhost-net with vhost-user backend.
>>
>> Example usage:
>>
>> qemu -m 1024 -mem-path /hugetlbfs,share=on \
>>      -chardev socket,id=chr0,path=/path/to/
>> socket \
>>      -netdev type=vhost-user,id=net0,chardev=chr0 \
>>      -device virtio-net-pci,netdev=net0
>>
>> On non-MSIX guests the vhost feture can be forced using a special option:
>>
>> ...
>>      -netdev type=vhost-user,id=net0,chardev=chr0,vhostforce
>> ...
>>
>> In order to use ioeventfds, kvm should be enabled.
>>
>> This code can be pulled from git@github.com:virtualopensystems/qemu.git
>> vhost-user-v9
>> A simple functional test is available in tests/vhost-user-test.c
>>
>> A reference vhost-user slave for testing is also available from
>> git@github.com:virtualopensystems/vapp.git
>>
>
> Hi,
>
> did you see the series I posted today?  It would be great if you tested
> vhost-user on top of it.  The replacement for the above -mem-path
> incantation should be the following:
>
>    -object memory-file,id=mem,path=/hugetlbfs,share=on \
>    -numa node,memdev=mem
>

Hello Paolo,

Yes we saw your series today, and we plan to try it out. Any idea on when
your series will get in? Then we can probably remove our own implementation
for shared memory.


>
> Do the ideas behind vhost-user work if you have multiple hugetlbfs files,
> one per node?
>
>
Yes, they should work.

Best regards


>  Paolo
>
>
>  Changes from v8:
>>  - Removed prealloc property from the -mem-path refactoring
>>  - Added and use new function - kvm_eventfds_enabled
>>  - Add virtio_queue_get_avail_idx used in vhost_virtqueue_stop to
>>    get a sane value in case of VHOST_GET_VRING_BASE failure
>>  - vhost user uses kvm_eventfds_enabled to check whether the ioeventfd
>>    capability of KVM is available
>>  - Added flag VHOST_USER_VRING_NOFD_MASK to be set when KICK, CALL or ERR
>> file
>>    descriptor is invalid or ioeventfd is not available
>>
>> Changes from v7:
>>  - Slave reconnection when using chardev in server mode
>>  - qtest vhost-user-test added
>>  - New qemu_chr_fe_get_msgfds for reading multiple fds from the chardev
>>  - Mandatory features in vhost_dev, used on reconnect to verify for
>> conflicts
>>  - Add vhostforce parameter to -netdev vhost-user (for non-MSIX guests)
>>  - Extend libqemustub.a to support qemu-char.c
>>
>> Changes from v6:
>>  - Remove the 'unlink' property of '-mem-path'
>>  - Extend qemu-char: blocking read, send fds, monitor for connection close
>>  - Vhost-user uses chardev as a backend
>>  - Poll and reconnect removed (no VHOST_USER_ECHO).
>>  - Disconnect is deteced by the chardev (G_IO_HUP event)
>>  - vhost-backend.c split to vhost-user.c
>>
>> Changes from v5:
>>  - Split -mem-path unlink option to a separate patch
>>  - Fds are passed only in the ancillary data
>>  - Stricter message size checks on receive/send
>>  - Netdev vhost-user now includes path and poll_time options
>>  - The connection probing interval is configurable
>>
>> Changes from v4:
>>  - Use error_report for errors
>>  - VhostUserMsg has new field `size` indicating the following payload
>> length.
>>    Field `flags` now has version and reply bits. The structure is packed.
>>  - Send data is of variable length (`size` field in message)
>>  - Receive in 2 steps, header and payload
>>  - Add new message type VHOST_USER_ECHO, to check connection status
>>
>> Changes from v3:
>>  - Convert -mem-path to QemuOpts with prealloc, share and unlink
>> properties
>>  - Set 1 sec timeout when read/write to the unix domain socket
>>  - Fix file descriptor leak
>>
>> Changes from v2:
>>  - Reconnect when the backend disappears
>>
>> Changes from v1:
>>  - Implementation of vhost-user netdev backend
>>  - Code improvements
>>
>> Antonios Motakis (20):
>>   Convert -mem-path to QemuOpts and add share property
>>   Add kvm_eventfds_enabled function
>>   Add chardev API qemu_chr_fe_read_all
>>   Add chardev API qemu_chr_fe_set_msgfds
>>   Add chardev API qemu_chr_fe_get_msgfds
>>   Add G_IO_HUP handler for socket chardev
>>   vhost_net should call the poll callback only when it is set
>>   Refactor virtio-net to use generic get_vhost_net
>>   Add new virtio API virtio_queue_get_avail_idx
>>   Gracefully handle ioctl failure in vhost_virtqueue_stop
>>   vhost_net_init will use VhostNetOptions to get all its arguments
>>   Add vhost_ops to vhost_dev struct and replace all relevant ioctls
>>   Add mandatory_features to vhost_dev
>>   Add vhost-backend and VhostBackendType
>>   Add vhost-user as a vhost backend.
>>   Add new vhost-user netdev backend
>>   Add the vhost-user netdev backend to the command line
>>   Add vhost-user protocol documentation
>>   libqemustub: add stubs to be able to use qemu-char.c
>>   Add qtest for vhost-user
>>
>>  docs/specs/vhost-user.txt         | 261 ++++++++++++++++++++++++++++
>>  exec.c                            |  21 ++-
>>  hmp-commands.hx                   |   4 +-
>>  hw/net/vhost_net.c                | 141 ++++++++++-----
>>  hw/net/virtio-net.c               |  29 +---
>>  hw/scsi/vhost-scsi.c              |  20 ++-
>>  hw/virtio/Makefile.objs           |   2 +-
>>  hw/virtio/vhost-backend.c         |  71 ++++++++
>>  hw/virtio/vhost-user.c            | 356 ++++++++++++++++++++++++++++++
>> ++++++++
>>  hw/virtio/vhost.c                 |  58 +++----
>>  hw/virtio/virtio.c                |   5 +
>>  include/exec/cpu-all.h            |   1 -
>>  include/hw/virtio/vhost-backend.h |  38 ++++
>>  include/hw/virtio/vhost.h         |   9 +-
>>  include/hw/virtio/virtio.h        |   1 +
>>  include/net/vhost-user.h          |  17 ++
>>  include/net/vhost_net.h           |  13 +-
>>  include/sysemu/char.h             |  43 ++++-
>>  include/sysemu/kvm.h              |  11 ++
>>  kvm-all.c                         |   4 +
>>  kvm-stub.c                        |   1 +
>>  net/Makefile.objs                 |   2 +-
>>  net/clients.h                     |   3 +
>>  net/hub.c                         |   1 +
>>  net/net.c                         |  25 +--
>>  net/tap.c                         |  18 +-
>>  net/vhost-user.c                  | 273 +++++++++++++++++++++++++++++
>>  qapi-schema.json                  |  19 +-
>>  qemu-char.c                       | 272 +++++++++++++++++++++++++----
>>  qemu-options.hx                   |  25 ++-
>>  stubs/Makefile.objs               |   8 +
>>  stubs/bdrv-commit-all.c           |   7 +
>>  stubs/chr-msmouse.c               |   7 +
>>  stubs/get-next-serial.c           |   3 +
>>  stubs/is-daemonized.c             |   7 +
>>  stubs/machine-init-done.c         |   6 +
>>  stubs/monitor-init.c              |   6 +
>>  stubs/notify-event.c              |   6 +
>>  stubs/vc-init.c                   |   7 +
>>  tests/Makefile                    |   4 +
>>  tests/vhost-user-test.c           | 309 ++++++++++++++++++++++++++++++
>> +++
>>  vl.c                              |  23 ++-
>>  42 files changed, 1979 insertions(+), 158 deletions(-)
>>  create mode 100644 docs/specs/vhost-user.txt
>>  create mode 100644 hw/virtio/vhost-backend.c
>>  create mode 100644 hw/virtio/vhost-user.c
>>  create mode 100644 include/hw/virtio/vhost-backend.h
>>  create mode 100644 include/net/vhost-user.h
>>  create mode 100644 net/vhost-user.c
>>  create mode 100644 stubs/bdrv-commit-all.c
>>  create mode 100644 stubs/chr-msmouse.c
>>  create mode 100644 stubs/get-next-serial.c
>>  create mode 100644 stubs/is-daemonized.c
>>  create mode 100644 stubs/machine-init-done.c
>>  create mode 100644 stubs/monitor-init.c
>>  create mode 100644 stubs/notify-event.c
>>  create mode 100644 stubs/vc-init.c
>>  create mode 100644 tests/vhost-user-test.c
>>
>>
>


-- 
Antonios Motakis
Virtual Open Systems

[-- Attachment #2: Type: text/html, Size: 11372 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends
  2014-03-04 18:33   ` Antonios Motakis
@ 2014-03-04 18:38     ` Paolo Bonzini
  0 siblings, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2014-03-04 18:38 UTC (permalink / raw)
  To: snabb-devel
  Cc: Luke Gorrie, VirtualOpenSystems Technical Team,
	qemu-devel qemu-devel, Nikolay Nikolaev, Michael S.Tsirkin

Il 04/03/2014 19:33, Antonios Motakis ha scritto:
>
> Hello Paolo,
>
> Yes we saw your series today, and we plan to try it out. Any idea on
> when your series will get in? Then we can probably remove our own
> implementation for shared memory.

It's planned for 2.1, together with memory hotplug which is not in the 
series but based on it.

Paolo

>
>
>     Do the ideas behind vhost-user work if you have multiple hugetlbfs
>     files, one per node?
>
>
> Yes, they should work.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev Antonios Motakis
@ 2014-03-04 18:38   ` Michael S. Tsirkin
  2014-03-05 13:40     ` Antonios Motakis
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2014-03-04 18:38 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: snabb-devel, qemu-devel, n.nikolaev, Nicholas Bellinger, lukego,
	Paolo Bonzini, tech

On Tue, Mar 04, 2014 at 07:22:56PM +0100, Antonios Motakis wrote:
> This will be used in a following patch to ensure that a vhost-user
> client reconnecting to QEMU supports the features that were exposed
> by the first client that initiated the virtio-net session.
> 
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>

Why isn't checking features, or backend_features field sufficient?


> ---
>  hw/net/vhost_net.c        | 10 ++++++++++
>  include/hw/virtio/vhost.h |  1 +
>  include/net/vhost_net.h   |  2 ++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 0fb4fa5..38e1e8a 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -80,6 +80,11 @@ void vhost_net_ack_features(struct vhost_net *net, unsigned features)
>      }
>  }
>  
> +unsigned long long vhost_net_features(VHostNetState *net)
> +{
> +    return net->dev.features;
> +}
> +
>  static int vhost_net_get_fd(NetClientState *backend)
>  {
>      switch (backend->info->type) {
> @@ -112,6 +117,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
>  
>      net->dev.nvqs = 2;
>      net->dev.vqs = net->vqs;
> +    net->dev.mandatory_features = options->mandatory_features;
>  
>      r = vhost_dev_init(&net->dev, options->opaque,
>                         options->force);
> @@ -347,6 +353,10 @@ unsigned vhost_net_get_features(struct vhost_net *net, unsigned features)
>  void vhost_net_ack_features(struct vhost_net *net, unsigned features)
>  {
>  }
> +unsigned long long vhost_net_features(struct vhost_net *net)
> +{
> +    return 0;
> +}
>  
>  bool vhost_net_virtqueue_pending(VHostNetState *net, int idx)
>  {
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 97641b6..0068d40 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -41,6 +41,7 @@ struct vhost_dev {
>      unsigned long long features;
>      unsigned long long acked_features;
>      unsigned long long backend_features;
> +    unsigned long long mandatory_features;
>      bool started;
>      bool log_enabled;
>      vhost_log_chunk_t *log;
> diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> index 2067ee2..b39bb45 100644
> --- a/include/net/vhost_net.h
> +++ b/include/net/vhost_net.h
> @@ -10,6 +10,7 @@ typedef struct VhostNetOptions {
>      NetClientState *net_backend;
>      void *opaque;
>      bool force;
> +    unsigned long long mandatory_features;
>  } VhostNetOptions;
>  
>  struct vhost_net *vhost_net_init(VhostNetOptions *options);
> @@ -22,6 +23,7 @@ void vhost_net_cleanup(VHostNetState *net);
>  
>  unsigned vhost_net_get_features(VHostNetState *net, unsigned features);
>  void vhost_net_ack_features(VHostNetState *net, unsigned features);
> +unsigned long long vhost_net_features(VHostNetState *net);
>  
>  bool vhost_net_virtqueue_pending(VHostNetState *net, int n);
>  void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
> -- 
> 1.8.3.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user
  2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user Antonios Motakis
@ 2014-03-04 18:39   ` Andreas Färber
  2014-03-05 13:39     ` Antonios Motakis
  0 siblings, 1 reply; 33+ messages in thread
From: Andreas Färber @ 2014-03-04 18:39 UTC (permalink / raw)
  To: Antonios Motakis, qemu-devel, snabb-devel
  Cc: Kevin Wolf, Anthony Liguori, mst, Markus Armbruster, n.nikolaev,
	Stefan Hajnoczi, lukego, tech

Am 04.03.2014 19:23, schrieb Antonios Motakis:
> This test creates a 'server' chardev to listen for vhost-user messages.
> Once VHOST_USER_SET_MEM_TABLE is received it mmaps each received region,
> and read 1k bytes from it. The read data is compared to data from readl.
> 
> The test requires hugetlbfs to be already mounted and writable. The mount
> point defaults to '/hugetlbfs' and can be specified via the environment
> variable QTEST_HUGETLBFS_PATH.
> 
> The rom pc-bios/pxe-virtio.rom is used to instantiate a virtio pcicontroller.
> 
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> ---
>  tests/Makefile          |   4 +
>  tests/vhost-user-test.c | 309 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 313 insertions(+)
>  create mode 100644 tests/vhost-user-test.c
> 
> diff --git a/tests/Makefile b/tests/Makefile
> index b17d41e..85bcae5 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -110,6 +110,7 @@ check-qtest-i386-y += tests/vmxnet3-test$(EXESUF)
>  gcov-files-i386-y += hw/net/vmxnet3.c
>  gcov-files-i386-y += hw/net/vmxnet_rx_pkt.c
>  gcov-files-i386-y += hw/net/vmxnet_tx_pkt.c
> +check-qtest-i386-y += tests/vhost-user-test$(EXESUF)

Not sure if I've asked already, but doesn't this test depend on certain
Linux host support (hugetlbfs, vhost module)?

One more comment below:

>  check-qtest-x86_64-y = $(check-qtest-i386-y)
>  gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
>  gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
> @@ -241,8 +242,11 @@ tests/ipoctal232-test$(EXESUF): tests/ipoctal232-test.o
>  tests/qom-test$(EXESUF): tests/qom-test.o
>  tests/blockdev-test$(EXESUF): tests/blockdev-test.o $(libqos-pc-obj-y)
>  tests/qdev-monitor-test$(EXESUF): tests/qdev-monitor-test.o $(libqos-pc-obj-y)
> +tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o libqemuutil.a libqemustub.a
>  tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o
>  
> +LIBS+= -lutil
> +
>  # QTest rules
>  
>  TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS)))
> diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
> new file mode 100644
> index 0000000..a7160f8
> --- /dev/null
> +++ b/tests/vhost-user-test.c
> @@ -0,0 +1,309 @@
> +/*
> + * QTest testcase for the vhost-user
> + *
> + * Copyright (c) 2014 Virtual Open Systems Sarl.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "libqtest.h"
> +#include "qemu/option.h"
> +#include "sysemu/char.h"
> +#include "sysemu/sysemu.h"
> +
> +#include <glib.h>
> +#include <linux/vhost.h>
> +#include <sys/mman.h>
> +#include <sys/vfs.h>
> +#include <qemu/sockets.h>
> +
> +#define QEMU_CMD_ACCEL  " -machine accel=tcg"
> +#define QEMU_CMD_MEM    " -mem-path %s,share=on"
> +#define QEMU_CMD_CHR    " -chardev socket,id=chr0,path=%s"
> +#define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=chr0,vhostforce"
> +#define QEMU_CMD_NET    " -device virtio-net-pci,netdev=net0 "
> +#define QEMU_CMD_ROM    " -option-rom ../pc-bios/pxe-virtio.rom"
> +
> +#define QEMU_CMD        QEMU_CMD_ACCEL QEMU_CMD_MEM QEMU_CMD_CHR \
> +                        QEMU_CMD_NETDEV QEMU_CMD_NET QEMU_CMD_ROM
> +
> +#define HUGETLBFS_MAGIC       0x958458f6
> +
> +/*********** FROM hw/virtio/vhost-user.c *************************************/
> +
> +#define VHOST_MEMORY_MAX_NREGIONS    8
> +
> +typedef enum VhostUserRequest {
> +    VHOST_USER_NONE = 0,
> +    VHOST_USER_GET_FEATURES = 1,
> +    VHOST_USER_SET_FEATURES = 2,
> +    VHOST_USER_SET_OWNER = 3,
> +    VHOST_USER_RESET_OWNER = 4,
> +    VHOST_USER_SET_MEM_TABLE = 5,
> +    VHOST_USER_SET_LOG_BASE = 6,
> +    VHOST_USER_SET_LOG_FD = 7,
> +    VHOST_USER_SET_VRING_NUM = 8,
> +    VHOST_USER_SET_VRING_ADDR = 9,
> +    VHOST_USER_SET_VRING_BASE = 10,
> +    VHOST_USER_GET_VRING_BASE = 11,
> +    VHOST_USER_SET_VRING_KICK = 12,
> +    VHOST_USER_SET_VRING_CALL = 13,
> +    VHOST_USER_SET_VRING_ERR = 14,
> +    VHOST_USER_MAX
> +} VhostUserRequest;
> +
> +typedef struct VhostUserMemoryRegion {
> +    uint64_t guest_phys_addr;
> +    uint64_t memory_size;
> +    uint64_t userspace_addr;
> +} VhostUserMemoryRegion;
> +
> +typedef struct VhostUserMemory {
> +    uint32_t nregions;
> +    uint32_t padding;
> +    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> +} VhostUserMemory;
> +
> +typedef struct VhostUserMsg {
> +    VhostUserRequest request;
> +
> +#define VHOST_USER_VERSION_MASK     (0x3)
> +#define VHOST_USER_REPLY_MASK       (0x1<<2)
> +    uint32_t flags;
> +    uint32_t size; /* the following payload size */
> +    union {
> +        uint64_t u64;
> +        struct vhost_vring_state state;
> +        struct vhost_vring_addr addr;
> +        VhostUserMemory memory;
> +    };
> +} QEMU_PACKED VhostUserMsg;
> +
> +static VhostUserMsg m __attribute__ ((unused));
> +#define VHOST_USER_HDR_SIZE (sizeof(m.request) \
> +                            + sizeof(m.flags) \
> +                            + sizeof(m.size))
> +
> +#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE)
> +
> +/* The version of the protocol we support */
> +#define VHOST_USER_VERSION    (0x1)
> +/*****************************************************************************/
> +
> +int fds_num = 0, fds[VHOST_MEMORY_MAX_NREGIONS];
> +static VhostUserMemory memory;
> +static GMutex data_mutex;
> +static GCond data_cond;
> +
> +static void read_guest_mem(void)
> +{
> +    uint32_t *guest_mem;
> +    gint64 end_time;
> +    int i, j;
> +
> +    g_mutex_lock(&data_mutex);
> +
> +    end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
> +    while (!fds_num) {
> +        if (!g_cond_wait_until(&data_cond, &data_mutex, end_time)) {
> +            /* timeout has passed */
> +            g_assert(fds_num);
> +            break;
> +        }
> +    }
> +
> +    /* check for sanity */
> +    g_assert_cmpint(fds_num, >, 0);
> +    g_assert_cmpint(fds_num, ==, memory.nregions);
> +
> +    /* iterate all regions */
> +    for (i = 0; i < fds_num; i++) {
> +
> +        /* We'll check only he region statring at 0x0, suppose it is */
> +        if (memory.regions[i].guest_phys_addr != 0x0) {
> +            continue;
> +        }
> +
> +        g_assert_cmpint(memory.regions[i].memory_size, >, 1024);
> +
> +        guest_mem = mmap(0, memory.regions[i].memory_size,
> +        PROT_READ | PROT_WRITE, MAP_SHARED, fds[i], 0);
> +
> +        for (j = 0; j < 256; j++) {
> +            uint32_t a = readl(memory.regions[i].guest_phys_addr + j*4);
> +            uint32_t b = guest_mem[j];
> +
> +            g_assert_cmpint(a, ==, b);
> +        }
> +
> +        munmap(guest_mem, memory.regions[i].memory_size);
> +    }
> +
> +    g_assert_cmpint(1, ==, 1);
> +    g_mutex_unlock(&data_mutex);
> +}
> +
> +static void *thread_function(void *data)
> +{
> +    GMainLoop *loop;
> +    loop = g_main_loop_new(NULL, FALSE);
> +    g_main_loop_run(loop);
> +    return NULL;
> +}
> +
> +static int chr_can_read(void *opaque)
> +{
> +    return VHOST_USER_HDR_SIZE;
> +}
> +
> +static void chr_read(void *opaque, const uint8_t *buf, int size)
> +{
> +    CharDriverState *chr = opaque;
> +    VhostUserMsg msg;
> +    uint8_t *p = (uint8_t *) &msg;
> +    int fd;
> +
> +    if (size != VHOST_USER_HDR_SIZE) {
> +        g_test_message("Wrong message size received %d\n", size);
> +        return;
> +    }
> +
> +    memcpy(p, buf, VHOST_USER_HDR_SIZE);
> +
> +    if (msg.size) {
> +        p += VHOST_USER_HDR_SIZE;
> +        qemu_chr_fe_read_all(chr, p, msg.size);
> +    }
> +
> +    switch (msg.request) {
> +    case VHOST_USER_GET_FEATURES:
> +        /* send back features to qemu */
> +        msg.flags |= VHOST_USER_REPLY_MASK;
> +        msg.size = sizeof(m.u64);
> +        msg.u64 = 0;
> +        p = (uint8_t *) &msg;
> +        qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
> +        break;
> +
> +    case VHOST_USER_GET_VRING_BASE:
> +        /* send back vring base to qemu */
> +        msg.flags |= VHOST_USER_REPLY_MASK;
> +        msg.size = sizeof(m.state);
> +        msg.state.num = 0;
> +        p = (uint8_t *) &msg;
> +        qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
> +        break;
> +
> +    case VHOST_USER_SET_MEM_TABLE:
> +        /* received the mem table */
> +        memcpy(&memory, &msg.memory, sizeof(msg.memory));
> +        fds_num = qemu_chr_fe_get_msgfds(chr, fds, sizeof(fds) / sizeof(int));
> +
> +        /* signal the test that it can continue */
> +        g_cond_signal(&data_cond);
> +        g_mutex_unlock(&data_mutex);
> +        break;
> +
> +    case VHOST_USER_SET_VRING_KICK:
> +    case VHOST_USER_SET_VRING_CALL:
> +        /* consume the fd */
> +        qemu_chr_fe_get_msgfds(chr, &fd, 1);
> +        /*
> +         * This is a non-blocking eventfd.
> +         * The receive function forces it to be blocking,
> +         * so revert it back to non-blocking.
> +         */
> +        qemu_set_nonblock(fd);
> +        break;
> +    default:
> +        break;
> +    }
> +}
> +
> +static const char *init_hugepagefs(void)
> +{
> +    const char *path;
> +    struct statfs fs;
> +    int ret;
> +
> +    path = getenv("QTEST_HUGETLBFS_PATH");
> +    if (!path) {
> +        path = "/hugetlbfs";
> +    }
> +
> +    if (access(path, R_OK | W_OK | X_OK)) {
> +        g_test_message("access on path (%s): %s\n", path, strerror(errno));
> +        return 0;
> +    }
> +
> +    do {
> +        ret = statfs(path, &fs);
> +    } while (ret != 0 && errno == EINTR);
> +
> +    if (ret != 0) {
> +        g_test_message("statfs on path (%s): %s\n", path, strerror(errno));
> +        return 0;
> +    }
> +
> +    if (fs.f_type != HUGETLBFS_MAGIC) {
> +        g_test_message("Warning: path not on HugeTLBFS: %s\n", path);
> +        return 0;
> +    }
> +
> +    return path;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    QTestState *s = NULL;
> +    CharDriverState *chr = NULL;
> +    const char *hugefs = 0;
> +    char *socket_path = 0;
> +    char *qemu_cmd = 0;
> +    char *chr_path = 0;
> +    int ret;
> +
> +    g_test_init(&argc, &argv, NULL);
> +
> +    module_call_init(MODULE_INIT_QOM);
> +
> +    hugefs = init_hugepagefs();
> +    g_assert(hugefs);
> +
> +    socket_path = g_strdup_printf("/tmp/vhost-%d.sock", getpid());
> +
> +    /* create char dev and add read handlers */
> +    qemu_add_opts(&qemu_chardev_opts);
> +    chr_path = g_strdup_printf("unix:%s,server,nowait", socket_path);
> +    chr = qemu_chr_new("chr0", chr_path, NULL);
> +    g_free(chr_path);
> +    qemu_chr_add_handlers(chr, chr_can_read, chr_read, NULL, chr);
> +
> +    /* run the main loop thread so the chardev may operate */
> +    g_mutex_init(&data_mutex);
> +    g_cond_init(&data_cond);
> +    g_mutex_lock(&data_mutex);
> +    g_thread_new(NULL, thread_function, NULL);
> +
> +    qemu_cmd = g_strdup_printf(QEMU_CMD, hugefs, socket_path);
> +    s = qtest_start(qemu_cmd);
> +    g_free(qemu_cmd);
> +
> +    qtest_add_func("/vhost-user/read-guest-mem", read_guest_mem);
> +
> +    ret = g_test_run();
> +
> +    if (s) {
> +        qtest_quit(s);
> +    }
> +
> +    /* cleanup */
> +    unlink(socket_path);
> +    g_free(socket_path);

I think this is probably too late to unlink in case the test fails,
should probably be moved to right after qtest_start()?

Thanks for your efforts on this test!

Regards,
Andreas

> +    g_cond_clear(&data_cond);
> +    g_mutex_clear(&data_mutex);
> +
> +    return ret;
> +}
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop
  2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop Antonios Motakis
@ 2014-03-04 18:45   ` Michael S. Tsirkin
  2014-03-05 13:38     ` Antonios Motakis
  0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2014-03-04 18:45 UTC (permalink / raw)
  To: Antonios Motakis; +Cc: lukego, snabb-devel, n.nikolaev, qemu-devel, tech

On Tue, Mar 04, 2014 at 07:22:53PM +0100, Antonios Motakis wrote:
> On stopping the vhost, a call to VHOST_GET_VRING_BASE is issued. The
> received value is stored as last_avail_idx, so the virtqueue can continue
> operating if the connection is resumed. Handle the failure of this call
> and use the current avail_idx. Some packets from the avail ring may be
> omitted but still we keep a sane value and can continue on reconnect.

omitted how?
some guests crash if we never complete handling buffers,
or networking breaks, etc ...

This would be a big problem for reconnect, some robust way to
communicate avail ring state would need to be found.
Is reconnect really a mandatory feature for you?
I'd suggest you drop it from v1, focus on basic functionality.

> 
> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>

Problem is, a bunch of stuff breaks if vhost keeps
going when we ask it to stop.
In particular it will keep looking at the ring
state when guest asked it to stop doing this,
this will corrupt guest memory.


> ---
>  hw/virtio/vhost.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 9e336ad..322e2c0 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -758,12 +758,13 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
>      assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
>      r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
>      if (r < 0) {
> +        state.num = virtio_queue_get_avail_idx(vdev, idx);
>          fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx, r);
>          fflush(stderr);
>      }
>      virtio_queue_set_last_avail_idx(vdev, idx, state.num);
>      virtio_queue_invalidate_signalled_used(vdev, idx);
> -    assert (r >= 0);
> +
>      cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
>                                0, virtio_queue_get_ring_size(vdev, idx));
>      cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
> -- 
> 1.8.3.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop
  2014-03-04 18:45   ` Michael S. Tsirkin
@ 2014-03-05 13:38     ` Antonios Motakis
  2014-03-05 13:47       ` Michael S. Tsirkin
  0 siblings, 1 reply; 33+ messages in thread
From: Antonios Motakis @ 2014-03-05 13:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Luke Gorrie, snabb-devel@googlegroups.com, Nikolay Nikolaev,
	qemu-devel qemu-devel, VirtualOpenSystems Technical Team

[-- Attachment #1: Type: text/plain, Size: 3003 bytes --]

Hello,


On Tue, Mar 4, 2014 at 7:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Mar 04, 2014 at 07:22:53PM +0100, Antonios Motakis wrote:
> > On stopping the vhost, a call to VHOST_GET_VRING_BASE is issued. The
> > received value is stored as last_avail_idx, so the virtqueue can continue
> > operating if the connection is resumed. Handle the failure of this call
> > and use the current avail_idx. Some packets from the avail ring may be
> > omitted but still we keep a sane value and can continue on reconnect.
>
> omitted how?
> some guests crash if we never complete handling buffers,
> or networking breaks, etc ...
>
> This would be a big problem for reconnect, some robust way to
> communicate avail ring state would need to be found.
> Is reconnect really a mandatory feature for you?
> I'd suggest you drop it from v1, focus on basic functionality.
>
>
Reconnect would be a really useful feature for us, so we tried to keep it
in a reasonable way.

However we didn't take into account that some guests might crash under
those assumptions. Looks like we have no option but to remove reconnect
altogether for now; maybe a future extension to the virtio-net spec will
allow us to do it cleanly, but I don't see an obvious workaround to keep
this in now.

Thanks for pointing this out.

Btw, since it looks like we are closing a final version of the patches,
what kind of timeframe should we aim for inclusion? Should we already
rebase on top of Paolo's NUMA patch series?

>
> > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
>
> Problem is, a bunch of stuff breaks if vhost keeps
> going when we ask it to stop.
> In particular it will keep looking at the ring
> state when guest asked it to stop doing this,
> this will corrupt guest memory.
>
>
> > ---
> >  hw/virtio/vhost.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 9e336ad..322e2c0 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -758,12 +758,13 @@ static void vhost_virtqueue_stop(struct vhost_dev
> *dev,
> >      assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
> >      r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
> >      if (r < 0) {
> > +        state.num = virtio_queue_get_avail_idx(vdev, idx);
> >          fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx,
> r);
> >          fflush(stderr);
> >      }
> >      virtio_queue_set_last_avail_idx(vdev, idx, state.num);
> >      virtio_queue_invalidate_signalled_used(vdev, idx);
> > -    assert (r >= 0);
> > +
> >      cpu_physical_memory_unmap(vq->ring,
> virtio_queue_get_ring_size(vdev, idx),
> >                                0, virtio_queue_get_ring_size(vdev, idx));
> >      cpu_physical_memory_unmap(vq->used,
> virtio_queue_get_used_size(vdev, idx),
> > --
> > 1.8.3.2
>



-- 
Antonios Motakis
Virtual Open Systems

[-- Attachment #2: Type: text/html, Size: 4072 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user
  2014-03-04 18:39   ` Andreas Färber
@ 2014-03-05 13:39     ` Antonios Motakis
  0 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-05 13:39 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Kevin Wolf, snabb-devel@googlegroups.com, Stefan Hajnoczi,
	Michael S.Tsirkin, qemu-devel qemu-devel, Nikolay Nikolaev,
	Markus Armbruster, Anthony Liguori, Luke Gorrie,
	VirtualOpenSystems Technical Team

[-- Attachment #1: Type: text/plain, Size: 13669 bytes --]

Hello Andreas,


On Tue, Mar 4, 2014 at 7:39 PM, Andreas Färber <afaerber@suse.de> wrote:

> Am 04.03.2014 19:23, schrieb Antonios Motakis:
> > This test creates a 'server' chardev to listen for vhost-user messages.
> > Once VHOST_USER_SET_MEM_TABLE is received it mmaps each received region,
> > and read 1k bytes from it. The read data is compared to data from readl.
> >
> > The test requires hugetlbfs to be already mounted and writable. The mount
> > point defaults to '/hugetlbfs' and can be specified via the environment
> > variable QTEST_HUGETLBFS_PATH.
> >
> > The rom pc-bios/pxe-virtio.rom is used to instantiate a virtio
> pcicontroller.
> >
> > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> > ---
> >  tests/Makefile          |   4 +
> >  tests/vhost-user-test.c | 309
> ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 313 insertions(+)
> >  create mode 100644 tests/vhost-user-test.c
> >
> > diff --git a/tests/Makefile b/tests/Makefile
> > index b17d41e..85bcae5 100644
> > --- a/tests/Makefile
> > +++ b/tests/Makefile
> > @@ -110,6 +110,7 @@ check-qtest-i386-y += tests/vmxnet3-test$(EXESUF)
> >  gcov-files-i386-y += hw/net/vmxnet3.c
> >  gcov-files-i386-y += hw/net/vmxnet_rx_pkt.c
> >  gcov-files-i386-y += hw/net/vmxnet_tx_pkt.c
> > +check-qtest-i386-y += tests/vhost-user-test$(EXESUF)
>
> Not sure if I've asked already, but doesn't this test depend on certain
> Linux host support (hugetlbfs, vhost module)?
>
>
We depend on hugetlbfs being not only available, but also mounted and
writable by us. Otherwise the test will fail.

We don't depend on vhost being available in the kernel; we do our own thing
with a userspace vhost implementation, which is provided. So there is no
external dependency related to vhost. The only external dependency is
hugetlbfs (which we can't avoid, the feature we are testing relies on it).


> One more comment below:
>
> >  check-qtest-x86_64-y = $(check-qtest-i386-y)
> >  gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
> >  gcov-files-x86_64-y = $(subst
> i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
> > @@ -241,8 +242,11 @@ tests/ipoctal232-test$(EXESUF):
> tests/ipoctal232-test.o
> >  tests/qom-test$(EXESUF): tests/qom-test.o
> >  tests/blockdev-test$(EXESUF): tests/blockdev-test.o $(libqos-pc-obj-y)
> >  tests/qdev-monitor-test$(EXESUF): tests/qdev-monitor-test.o
> $(libqos-pc-obj-y)
> > +tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o
> qemu-timer.o libqemuutil.a libqemustub.a
> >  tests/qemu-iotests/socket_scm_helper$(EXESUF):
> tests/qemu-iotests/socket_scm_helper.o
> >
> > +LIBS+= -lutil
> > +
> >  # QTest rules
> >
> >  TARGETS=$(patsubst %-softmmu,%, $(filter %-softmmu,$(TARGET_DIRS)))
> > diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
> > new file mode 100644
> > index 0000000..a7160f8
> > --- /dev/null
> > +++ b/tests/vhost-user-test.c
> > @@ -0,0 +1,309 @@
> > +/*
> > + * QTest testcase for the vhost-user
> > + *
> > + * Copyright (c) 2014 Virtual Open Systems Sarl.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "libqtest.h"
> > +#include "qemu/option.h"
> > +#include "sysemu/char.h"
> > +#include "sysemu/sysemu.h"
> > +
> > +#include <glib.h>
> > +#include <linux/vhost.h>
> > +#include <sys/mman.h>
> > +#include <sys/vfs.h>
> > +#include <qemu/sockets.h>
> > +
> > +#define QEMU_CMD_ACCEL  " -machine accel=tcg"
> > +#define QEMU_CMD_MEM    " -mem-path %s,share=on"
> > +#define QEMU_CMD_CHR    " -chardev socket,id=chr0,path=%s"
> > +#define QEMU_CMD_NETDEV " -netdev
> vhost-user,id=net0,chardev=chr0,vhostforce"
> > +#define QEMU_CMD_NET    " -device virtio-net-pci,netdev=net0 "
> > +#define QEMU_CMD_ROM    " -option-rom ../pc-bios/pxe-virtio.rom"
> > +
> > +#define QEMU_CMD        QEMU_CMD_ACCEL QEMU_CMD_MEM QEMU_CMD_CHR \
> > +                        QEMU_CMD_NETDEV QEMU_CMD_NET QEMU_CMD_ROM
> > +
> > +#define HUGETLBFS_MAGIC       0x958458f6
> > +
> > +/*********** FROM hw/virtio/vhost-user.c
> *************************************/
> > +
> > +#define VHOST_MEMORY_MAX_NREGIONS    8
> > +
> > +typedef enum VhostUserRequest {
> > +    VHOST_USER_NONE = 0,
> > +    VHOST_USER_GET_FEATURES = 1,
> > +    VHOST_USER_SET_FEATURES = 2,
> > +    VHOST_USER_SET_OWNER = 3,
> > +    VHOST_USER_RESET_OWNER = 4,
> > +    VHOST_USER_SET_MEM_TABLE = 5,
> > +    VHOST_USER_SET_LOG_BASE = 6,
> > +    VHOST_USER_SET_LOG_FD = 7,
> > +    VHOST_USER_SET_VRING_NUM = 8,
> > +    VHOST_USER_SET_VRING_ADDR = 9,
> > +    VHOST_USER_SET_VRING_BASE = 10,
> > +    VHOST_USER_GET_VRING_BASE = 11,
> > +    VHOST_USER_SET_VRING_KICK = 12,
> > +    VHOST_USER_SET_VRING_CALL = 13,
> > +    VHOST_USER_SET_VRING_ERR = 14,
> > +    VHOST_USER_MAX
> > +} VhostUserRequest;
> > +
> > +typedef struct VhostUserMemoryRegion {
> > +    uint64_t guest_phys_addr;
> > +    uint64_t memory_size;
> > +    uint64_t userspace_addr;
> > +} VhostUserMemoryRegion;
> > +
> > +typedef struct VhostUserMemory {
> > +    uint32_t nregions;
> > +    uint32_t padding;
> > +    VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> > +} VhostUserMemory;
> > +
> > +typedef struct VhostUserMsg {
> > +    VhostUserRequest request;
> > +
> > +#define VHOST_USER_VERSION_MASK     (0x3)
> > +#define VHOST_USER_REPLY_MASK       (0x1<<2)
> > +    uint32_t flags;
> > +    uint32_t size; /* the following payload size */
> > +    union {
> > +        uint64_t u64;
> > +        struct vhost_vring_state state;
> > +        struct vhost_vring_addr addr;
> > +        VhostUserMemory memory;
> > +    };
> > +} QEMU_PACKED VhostUserMsg;
> > +
> > +static VhostUserMsg m __attribute__ ((unused));
> > +#define VHOST_USER_HDR_SIZE (sizeof(m.request) \
> > +                            + sizeof(m.flags) \
> > +                            + sizeof(m.size))
> > +
> > +#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE)
> > +
> > +/* The version of the protocol we support */
> > +#define VHOST_USER_VERSION    (0x1)
> >
> +/*****************************************************************************/
> > +
> > +int fds_num = 0, fds[VHOST_MEMORY_MAX_NREGIONS];
> > +static VhostUserMemory memory;
> > +static GMutex data_mutex;
> > +static GCond data_cond;
> > +
> > +static void read_guest_mem(void)
> > +{
> > +    uint32_t *guest_mem;
> > +    gint64 end_time;
> > +    int i, j;
> > +
> > +    g_mutex_lock(&data_mutex);
> > +
> > +    end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
> > +    while (!fds_num) {
> > +        if (!g_cond_wait_until(&data_cond, &data_mutex, end_time)) {
> > +            /* timeout has passed */
> > +            g_assert(fds_num);
> > +            break;
> > +        }
> > +    }
> > +
> > +    /* check for sanity */
> > +    g_assert_cmpint(fds_num, >, 0);
> > +    g_assert_cmpint(fds_num, ==, memory.nregions);
> > +
> > +    /* iterate all regions */
> > +    for (i = 0; i < fds_num; i++) {
> > +
> > +        /* We'll check only he region statring at 0x0, suppose it is */
> > +        if (memory.regions[i].guest_phys_addr != 0x0) {
> > +            continue;
> > +        }
> > +
> > +        g_assert_cmpint(memory.regions[i].memory_size, >, 1024);
> > +
> > +        guest_mem = mmap(0, memory.regions[i].memory_size,
> > +        PROT_READ | PROT_WRITE, MAP_SHARED, fds[i], 0);
> > +
> > +        for (j = 0; j < 256; j++) {
> > +            uint32_t a = readl(memory.regions[i].guest_phys_addr + j*4);
> > +            uint32_t b = guest_mem[j];
> > +
> > +            g_assert_cmpint(a, ==, b);
> > +        }
> > +
> > +        munmap(guest_mem, memory.regions[i].memory_size);
> > +    }
> > +
> > +    g_assert_cmpint(1, ==, 1);
> > +    g_mutex_unlock(&data_mutex);
> > +}
> > +
> > +static void *thread_function(void *data)
> > +{
> > +    GMainLoop *loop;
> > +    loop = g_main_loop_new(NULL, FALSE);
> > +    g_main_loop_run(loop);
> > +    return NULL;
> > +}
> > +
> > +static int chr_can_read(void *opaque)
> > +{
> > +    return VHOST_USER_HDR_SIZE;
> > +}
> > +
> > +static void chr_read(void *opaque, const uint8_t *buf, int size)
> > +{
> > +    CharDriverState *chr = opaque;
> > +    VhostUserMsg msg;
> > +    uint8_t *p = (uint8_t *) &msg;
> > +    int fd;
> > +
> > +    if (size != VHOST_USER_HDR_SIZE) {
> > +        g_test_message("Wrong message size received %d\n", size);
> > +        return;
> > +    }
> > +
> > +    memcpy(p, buf, VHOST_USER_HDR_SIZE);
> > +
> > +    if (msg.size) {
> > +        p += VHOST_USER_HDR_SIZE;
> > +        qemu_chr_fe_read_all(chr, p, msg.size);
> > +    }
> > +
> > +    switch (msg.request) {
> > +    case VHOST_USER_GET_FEATURES:
> > +        /* send back features to qemu */
> > +        msg.flags |= VHOST_USER_REPLY_MASK;
> > +        msg.size = sizeof(m.u64);
> > +        msg.u64 = 0;
> > +        p = (uint8_t *) &msg;
> > +        qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
> > +        break;
> > +
> > +    case VHOST_USER_GET_VRING_BASE:
> > +        /* send back vring base to qemu */
> > +        msg.flags |= VHOST_USER_REPLY_MASK;
> > +        msg.size = sizeof(m.state);
> > +        msg.state.num = 0;
> > +        p = (uint8_t *) &msg;
> > +        qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
> > +        break;
> > +
> > +    case VHOST_USER_SET_MEM_TABLE:
> > +        /* received the mem table */
> > +        memcpy(&memory, &msg.memory, sizeof(msg.memory));
> > +        fds_num = qemu_chr_fe_get_msgfds(chr, fds, sizeof(fds) /
> sizeof(int));
> > +
> > +        /* signal the test that it can continue */
> > +        g_cond_signal(&data_cond);
> > +        g_mutex_unlock(&data_mutex);
> > +        break;
> > +
> > +    case VHOST_USER_SET_VRING_KICK:
> > +    case VHOST_USER_SET_VRING_CALL:
> > +        /* consume the fd */
> > +        qemu_chr_fe_get_msgfds(chr, &fd, 1);
> > +        /*
> > +         * This is a non-blocking eventfd.
> > +         * The receive function forces it to be blocking,
> > +         * so revert it back to non-blocking.
> > +         */
> > +        qemu_set_nonblock(fd);
> > +        break;
> > +    default:
> > +        break;
> > +    }
> > +}
> > +
> > +static const char *init_hugepagefs(void)
> > +{
> > +    const char *path;
> > +    struct statfs fs;
> > +    int ret;
> > +
> > +    path = getenv("QTEST_HUGETLBFS_PATH");
> > +    if (!path) {
> > +        path = "/hugetlbfs";
> > +    }
> > +
> > +    if (access(path, R_OK | W_OK | X_OK)) {
> > +        g_test_message("access on path (%s): %s\n", path,
> strerror(errno));
> > +        return 0;
> > +    }
> > +
> > +    do {
> > +        ret = statfs(path, &fs);
> > +    } while (ret != 0 && errno == EINTR);
> > +
> > +    if (ret != 0) {
> > +        g_test_message("statfs on path (%s): %s\n", path,
> strerror(errno));
> > +        return 0;
> > +    }
> > +
> > +    if (fs.f_type != HUGETLBFS_MAGIC) {
> > +        g_test_message("Warning: path not on HugeTLBFS: %s\n", path);
> > +        return 0;
> > +    }
> > +
> > +    return path;
> > +}
> > +
> > +int main(int argc, char **argv)
> > +{
> > +    QTestState *s = NULL;
> > +    CharDriverState *chr = NULL;
> > +    const char *hugefs = 0;
> > +    char *socket_path = 0;
> > +    char *qemu_cmd = 0;
> > +    char *chr_path = 0;
> > +    int ret;
> > +
> > +    g_test_init(&argc, &argv, NULL);
> > +
> > +    module_call_init(MODULE_INIT_QOM);
> > +
> > +    hugefs = init_hugepagefs();
> > +    g_assert(hugefs);
> > +
> > +    socket_path = g_strdup_printf("/tmp/vhost-%d.sock", getpid());
> > +
> > +    /* create char dev and add read handlers */
> > +    qemu_add_opts(&qemu_chardev_opts);
> > +    chr_path = g_strdup_printf("unix:%s,server,nowait", socket_path);
> > +    chr = qemu_chr_new("chr0", chr_path, NULL);
> > +    g_free(chr_path);
> > +    qemu_chr_add_handlers(chr, chr_can_read, chr_read, NULL, chr);
> > +
> > +    /* run the main loop thread so the chardev may operate */
> > +    g_mutex_init(&data_mutex);
> > +    g_cond_init(&data_cond);
> > +    g_mutex_lock(&data_mutex);
> > +    g_thread_new(NULL, thread_function, NULL);
> > +
> > +    qemu_cmd = g_strdup_printf(QEMU_CMD, hugefs, socket_path);
> > +    s = qtest_start(qemu_cmd);
> > +    g_free(qemu_cmd);
> > +
> > +    qtest_add_func("/vhost-user/read-guest-mem", read_guest_mem);
> > +
> > +    ret = g_test_run();
> > +
> > +    if (s) {
> > +        qtest_quit(s);
> > +    }
> > +
> > +    /* cleanup */
> > +    unlink(socket_path);
> > +    g_free(socket_path);
>
> I think this is probably too late to unlink in case the test fails,
> should probably be moved to right after qtest_start()?
>
>
Ack


> Thanks for your efforts on this test!
>
>
Thanks for your review!


> Regards,
> Andreas
>
> > +    g_cond_clear(&data_cond);
> > +    g_mutex_clear(&data_mutex);
> > +
> > +    return ret;
> > +}
> >
>
>
> --
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
>



-- 
Antonios Motakis
Virtual Open Systems

[-- Attachment #2: Type: text/html, Size: 16892 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev
  2014-03-04 18:38   ` Michael S. Tsirkin
@ 2014-03-05 13:40     ` Antonios Motakis
  0 siblings, 0 replies; 33+ messages in thread
From: Antonios Motakis @ 2014-03-05 13:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: snabb-devel@googlegroups.com, qemu-devel qemu-devel,
	Nikolay Nikolaev, Nicholas Bellinger, Luke Gorrie, Paolo Bonzini,
	VirtualOpenSystems Technical Team

[-- Attachment #1: Type: text/plain, Size: 3429 bytes --]

On Tue, Mar 4, 2014 at 7:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Mar 04, 2014 at 07:22:56PM +0100, Antonios Motakis wrote:
> > This will be used in a following patch to ensure that a vhost-user
> > client reconnecting to QEMU supports the features that were exposed
> > by the first client that initiated the virtio-net session.
> >
> > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
> > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
>
> Why isn't checking features, or backend_features field sufficient?
>
>
Since we unfortunately need to remove reconnect support, this will go away
too.


>
> > ---
> >  hw/net/vhost_net.c        | 10 ++++++++++
> >  include/hw/virtio/vhost.h |  1 +
> >  include/net/vhost_net.h   |  2 ++
> >  3 files changed, 13 insertions(+)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 0fb4fa5..38e1e8a 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -80,6 +80,11 @@ void vhost_net_ack_features(struct vhost_net *net,
> unsigned features)
> >      }
> >  }
> >
> > +unsigned long long vhost_net_features(VHostNetState *net)
> > +{
> > +    return net->dev.features;
> > +}
> > +
> >  static int vhost_net_get_fd(NetClientState *backend)
> >  {
> >      switch (backend->info->type) {
> > @@ -112,6 +117,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions
> *options)
> >
> >      net->dev.nvqs = 2;
> >      net->dev.vqs = net->vqs;
> > +    net->dev.mandatory_features = options->mandatory_features;
> >
> >      r = vhost_dev_init(&net->dev, options->opaque,
> >                         options->force);
> > @@ -347,6 +353,10 @@ unsigned vhost_net_get_features(struct vhost_net
> *net, unsigned features)
> >  void vhost_net_ack_features(struct vhost_net *net, unsigned features)
> >  {
> >  }
> > +unsigned long long vhost_net_features(struct vhost_net *net)
> > +{
> > +    return 0;
> > +}
> >
> >  bool vhost_net_virtqueue_pending(VHostNetState *net, int idx)
> >  {
> > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > index 97641b6..0068d40 100644
> > --- a/include/hw/virtio/vhost.h
> > +++ b/include/hw/virtio/vhost.h
> > @@ -41,6 +41,7 @@ struct vhost_dev {
> >      unsigned long long features;
> >      unsigned long long acked_features;
> >      unsigned long long backend_features;
> > +    unsigned long long mandatory_features;
> >      bool started;
> >      bool log_enabled;
> >      vhost_log_chunk_t *log;
> > diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
> > index 2067ee2..b39bb45 100644
> > --- a/include/net/vhost_net.h
> > +++ b/include/net/vhost_net.h
> > @@ -10,6 +10,7 @@ typedef struct VhostNetOptions {
> >      NetClientState *net_backend;
> >      void *opaque;
> >      bool force;
> > +    unsigned long long mandatory_features;
> >  } VhostNetOptions;
> >
> >  struct vhost_net *vhost_net_init(VhostNetOptions *options);
> > @@ -22,6 +23,7 @@ void vhost_net_cleanup(VHostNetState *net);
> >
> >  unsigned vhost_net_get_features(VHostNetState *net, unsigned features);
> >  void vhost_net_ack_features(VHostNetState *net, unsigned features);
> > +unsigned long long vhost_net_features(VHostNetState *net);
> >
> >  bool vhost_net_virtqueue_pending(VHostNetState *net, int n);
> >  void vhost_net_virtqueue_mask(VHostNetState *net, VirtIODevice *dev,
> > --
> > 1.8.3.2
>



-- 
Antonios Motakis
Virtual Open Systems

[-- Attachment #2: Type: text/html, Size: 4659 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop
  2014-03-05 13:38     ` Antonios Motakis
@ 2014-03-05 13:47       ` Michael S. Tsirkin
  0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2014-03-05 13:47 UTC (permalink / raw)
  To: Antonios Motakis
  Cc: Luke Gorrie, snabb-devel@googlegroups.com, Nikolay Nikolaev,
	qemu-devel qemu-devel, VirtualOpenSystems Technical Team

On Wed, Mar 05, 2014 at 02:38:34PM +0100, Antonios Motakis wrote:
> Hello,
> 
> 
> On Tue, Mar 4, 2014 at 7:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Tue, Mar 04, 2014 at 07:22:53PM +0100, Antonios Motakis wrote:
>     > On stopping the vhost, a call to VHOST_GET_VRING_BASE is issued. The
>     > received value is stored as last_avail_idx, so the virtqueue can continue
>     > operating if the connection is resumed. Handle the failure of this call
>     > and use the current avail_idx. Some packets from the avail ring may be
>     > omitted but still we keep a sane value and can continue on reconnect.
> 
>     omitted how?
>     some guests crash if we never complete handling buffers,
>     or networking breaks, etc ...
> 
>     This would be a big problem for reconnect, some robust way to
>     communicate avail ring state would need to be found.
>     Is reconnect really a mandatory feature for you?
>     I'd suggest you drop it from v1, focus on basic functionality.
> 
> 
> 
> Reconnect would be a really useful feature for us, so we tried to keep it in a
> reasonable way.
> 
> However we didn't take into account that some guests might crash under those
> assumptions. Looks like we have no option but to remove reconnect altogether
> for now; maybe a future extension to the virtio-net spec will allow us to do it
> cleanly, but I don't see an obvious workaround to keep this in now.
> 
> Thanks for pointing this out.
> 
> Btw, since it looks like we are closing a final version of the patches, what
> kind of timeframe should we aim for inclusion? Should we already rebase on top
> of Paolo's NUMA patch series?
> 

I think it should be possible to merge after Paolo's
patches go in, yes. I haven't followed them closely
so I don't know when will that be.

I wish someone else would ack the char dev changes - anyone?
But it's not a blocker requirement.


>     >
>     > Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com>
>     > Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
> 
>     Problem is, a bunch of stuff breaks if vhost keeps
>     going when we ask it to stop.
>     In particular it will keep looking at the ring
>     state when guest asked it to stop doing this,
>     this will corrupt guest memory.
> 
> 
>     > ---
>     >  hw/virtio/vhost.c | 3 ++-
>     >  1 file changed, 2 insertions(+), 1 deletion(-)
>     >
>     > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>     > index 9e336ad..322e2c0 100644
>     > --- a/hw/virtio/vhost.c
>     > +++ b/hw/virtio/vhost.c
>     > @@ -758,12 +758,13 @@ static void vhost_virtqueue_stop(struct vhost_dev
>     *dev,
>     >      assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
>     >      r = ioctl(dev->control, VHOST_GET_VRING_BASE, &state);
>     >      if (r < 0) {
>     > +        state.num = virtio_queue_get_avail_idx(vdev, idx);
>     >          fprintf(stderr, "vhost VQ %d ring restore failed: %d\n", idx,
>     r);
>     >          fflush(stderr);
>     >      }
>     >      virtio_queue_set_last_avail_idx(vdev, idx, state.num);
>     >      virtio_queue_invalidate_signalled_used(vdev, idx);
>     > -    assert (r >= 0);
>     > +
>     >      cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev,
>     idx),
>     >                                0, virtio_queue_get_ring_size(vdev, idx));
>     >      cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev,
>     idx),
>     > --
>     > 1.8.3.2
> 
> 
> 
> 
> --
> Antonios Motakis
> Virtual Open Systems

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends
  2014-03-04 18:29 ` [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Paolo Bonzini
  2014-03-04 18:33   ` Antonios Motakis
@ 2014-05-20 11:22   ` Nikolay Nikolaev
  2014-05-20 12:51     ` Paolo Bonzini
  1 sibling, 1 reply; 33+ messages in thread
From: Nikolay Nikolaev @ 2014-05-20 11:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: snabb-devel, mst, qemu-devel, Luke Gorrie, Antonios Motakis,
	VirtualOpenSystems Technical Team

[-- Attachment #1: Type: text/plain, Size: 9840 bytes --]

Hello Paolo,


On Tue, Mar 4, 2014 at 8:29 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 04/03/2014 19:22, Antonios Motakis ha scritto:
>
>  In this patch series we would like to introduce our approach for putting a
>> virtio-net backend in an external userspace process. Our eventual target
>> is to
>> run the network backend in the Snabbswitch ethernet switch, while
>> receiving
>> traffic from a guest inside QEMU/KVM which runs an unmodified virtio-net
>> implementation.
>>
>> For this, we are working into extending vhost to allow equivalent
>> functionality
>> for userspace. Vhost already passes control of the data plane of
>> virtio-net to
>> the host kernel; we want to realize a similar model, but for userspace.
>>
>> In this patch series the concept of a vhost-backend is introduced.
>>
>> We define two vhost backend types - vhost-kernel and vhost-user. The
>> former is
>> the interface to the current kernel module implementation. Its control
>> plane is
>> ioctl based. The data plane is realized by the kernel directly accessing
>> the
>> QEMU allocated, guest memory.
>>
>> In the new vhost-user backend, the control plane is based on communication
>> between QEMU and another userspace process using a unix domain socket.
>> This
>> allows to implement a virtio backend for a guest running in QEMU, inside
>> the
>> other userspace process. For this communication we use a chardev with a
>> Unix
>> domain socket backend. Vhost-user is client/server agnostic regarding the
>> chardev, however it does not support the 'nowait' and 'telnet' options.
>>
>> A reconnection in 'server' mode is supported, but the backend's exposed
>> virtio
>> features need to be compatible with the first connected slave.
>>
>> We change -mem-path to QemuOpts and add prealloc and share as properties
>> to it. HugeTLBFS is required for this option to work.
>>
>> The data path is realized by directly accessing the vrings and the buffer
>> data
>> off the guest's memory.
>>
>> The current user of vhost-user is only vhost-net. We add new netdev
>> backend
>> that is intended to initialize vhost-net with vhost-user backend.
>>
>> Example usage:
>>
>> qemu -m 1024 -mem-path /hugetlbfs,share=on \
>>      -chardev socket,id=chr0,path=/path/to/
>> socket \
>>      -netdev type=vhost-user,id=net0,chardev=chr0 \
>>      -device virtio-net-pci,netdev=net0
>>
>> On non-MSIX guests the vhost feture can be forced using a special option:
>>
>> ...
>>      -netdev type=vhost-user,id=net0,chardev=chr0,vhostforce
>> ...
>>
>> In order to use ioeventfds, kvm should be enabled.
>>
>> This code can be pulled from git@github.com:virtualopensystems/qemu.git
>> vhost-user-v9
>> A simple functional test is available in tests/vhost-user-test.c
>>
>> A reference vhost-user slave for testing is also available from
>> git@github.com:virtualopensystems/vapp.git
>>
>
> Hi,
>
> did you see the series I posted today?  It would be great if you tested
> vhost-user on top of it.


I am preparing a new version of this patch series and would like to test
(or even rebase?) it on top of your patch series. What is its current
status? Is there a tree that I can track and use for rebases?


> The replacement for the above -mem-path incantation should be the
> following:
>
>    -object memory-file,id=mem,path=/hugetlbfs,share=on \
>    -numa node,memdev=mem
>
>
Is this command line update still valid with the latest development?


> Do the ideas behind vhost-user work if you have multiple hugetlbfs files,
> one per node?
>

> Paolo
>
>
>  Changes from v8:
>>  - Removed prealloc property from the -mem-path refactoring
>>  - Added and use new function - kvm_eventfds_enabled
>>  - Add virtio_queue_get_avail_idx used in vhost_virtqueue_stop to
>>    get a sane value in case of VHOST_GET_VRING_BASE failure
>>  - vhost user uses kvm_eventfds_enabled to check whether the ioeventfd
>>    capability of KVM is available
>>  - Added flag VHOST_USER_VRING_NOFD_MASK to be set when KICK, CALL or ERR
>> file
>>    descriptor is invalid or ioeventfd is not available
>>
>> Changes from v7:
>>  - Slave reconnection when using chardev in server mode
>>  - qtest vhost-user-test added
>>  - New qemu_chr_fe_get_msgfds for reading multiple fds from the chardev
>>  - Mandatory features in vhost_dev, used on reconnect to verify for
>> conflicts
>>  - Add vhostforce parameter to -netdev vhost-user (for non-MSIX guests)
>>  - Extend libqemustub.a to support qemu-char.c
>>
>> Changes from v6:
>>  - Remove the 'unlink' property of '-mem-path'
>>  - Extend qemu-char: blocking read, send fds, monitor for connection close
>>  - Vhost-user uses chardev as a backend
>>  - Poll and reconnect removed (no VHOST_USER_ECHO).
>>  - Disconnect is deteced by the chardev (G_IO_HUP event)
>>  - vhost-backend.c split to vhost-user.c
>>
>> Changes from v5:
>>  - Split -mem-path unlink option to a separate patch
>>  - Fds are passed only in the ancillary data
>>  - Stricter message size checks on receive/send
>>  - Netdev vhost-user now includes path and poll_time options
>>  - The connection probing interval is configurable
>>
>> Changes from v4:
>>  - Use error_report for errors
>>  - VhostUserMsg has new field `size` indicating the following payload
>> length.
>>    Field `flags` now has version and reply bits. The structure is packed.
>>  - Send data is of variable length (`size` field in message)
>>  - Receive in 2 steps, header and payload
>>  - Add new message type VHOST_USER_ECHO, to check connection status
>>
>> Changes from v3:
>>  - Convert -mem-path to QemuOpts with prealloc, share and unlink
>> properties
>>  - Set 1 sec timeout when read/write to the unix domain socket
>>  - Fix file descriptor leak
>>
>> Changes from v2:
>>  - Reconnect when the backend disappears
>>
>> Changes from v1:
>>  - Implementation of vhost-user netdev backend
>>  - Code improvements
>>
>> Antonios Motakis (20):
>>   Convert -mem-path to QemuOpts and add share property
>>   Add kvm_eventfds_enabled function
>>   Add chardev API qemu_chr_fe_read_all
>>   Add chardev API qemu_chr_fe_set_msgfds
>>   Add chardev API qemu_chr_fe_get_msgfds
>>   Add G_IO_HUP handler for socket chardev
>>   vhost_net should call the poll callback only when it is set
>>   Refactor virtio-net to use generic get_vhost_net
>>   Add new virtio API virtio_queue_get_avail_idx
>>   Gracefully handle ioctl failure in vhost_virtqueue_stop
>>   vhost_net_init will use VhostNetOptions to get all its arguments
>>   Add vhost_ops to vhost_dev struct and replace all relevant ioctls
>>   Add mandatory_features to vhost_dev
>>   Add vhost-backend and VhostBackendType
>>   Add vhost-user as a vhost backend.
>>   Add new vhost-user netdev backend
>>   Add the vhost-user netdev backend to the command line
>>   Add vhost-user protocol documentation
>>   libqemustub: add stubs to be able to use qemu-char.c
>>   Add qtest for vhost-user
>>
>>  docs/specs/vhost-user.txt         | 261 ++++++++++++++++++++++++++++
>>  exec.c                            |  21 ++-
>>  hmp-commands.hx                   |   4 +-
>>  hw/net/vhost_net.c                | 141 ++++++++++-----
>>  hw/net/virtio-net.c               |  29 +---
>>  hw/scsi/vhost-scsi.c              |  20 ++-
>>  hw/virtio/Makefile.objs           |   2 +-
>>  hw/virtio/vhost-backend.c         |  71 ++++++++
>>  hw/virtio/vhost-user.c            | 356 ++++++++++++++++++++++++++++++
>> ++++++++
>>  hw/virtio/vhost.c                 |  58 +++----
>>  hw/virtio/virtio.c                |   5 +
>>  include/exec/cpu-all.h            |   1 -
>>  include/hw/virtio/vhost-backend.h |  38 ++++
>>  include/hw/virtio/vhost.h         |   9 +-
>>  include/hw/virtio/virtio.h        |   1 +
>>  include/net/vhost-user.h          |  17 ++
>>  include/net/vhost_net.h           |  13 +-
>>  include/sysemu/char.h             |  43 ++++-
>>  include/sysemu/kvm.h              |  11 ++
>>  kvm-all.c                         |   4 +
>>  kvm-stub.c                        |   1 +
>>  net/Makefile.objs                 |   2 +-
>>  net/clients.h                     |   3 +
>>  net/hub.c                         |   1 +
>>  net/net.c                         |  25 +--
>>  net/tap.c                         |  18 +-
>>  net/vhost-user.c                  | 273 +++++++++++++++++++++++++++++
>>  qapi-schema.json                  |  19 +-
>>  qemu-char.c                       | 272 +++++++++++++++++++++++++----
>>  qemu-options.hx                   |  25 ++-
>>  stubs/Makefile.objs               |   8 +
>>  stubs/bdrv-commit-all.c           |   7 +
>>  stubs/chr-msmouse.c               |   7 +
>>  stubs/get-next-serial.c           |   3 +
>>  stubs/is-daemonized.c             |   7 +
>>  stubs/machine-init-done.c         |   6 +
>>  stubs/monitor-init.c              |   6 +
>>  stubs/notify-event.c              |   6 +
>>  stubs/vc-init.c                   |   7 +
>>  tests/Makefile                    |   4 +
>>  tests/vhost-user-test.c           | 309 ++++++++++++++++++++++++++++++
>> +++
>>  vl.c                              |  23 ++-
>>  42 files changed, 1979 insertions(+), 158 deletions(-)
>>  create mode 100644 docs/specs/vhost-user.txt
>>  create mode 100644 hw/virtio/vhost-backend.c
>>  create mode 100644 hw/virtio/vhost-user.c
>>  create mode 100644 include/hw/virtio/vhost-backend.h
>>  create mode 100644 include/net/vhost-user.h
>>  create mode 100644 net/vhost-user.c
>>  create mode 100644 stubs/bdrv-commit-all.c
>>  create mode 100644 stubs/chr-msmouse.c
>>  create mode 100644 stubs/get-next-serial.c
>>  create mode 100644 stubs/is-daemonized.c
>>  create mode 100644 stubs/machine-init-done.c
>>  create mode 100644 stubs/monitor-init.c
>>  create mode 100644 stubs/notify-event.c
>>  create mode 100644 stubs/vc-init.c
>>  create mode 100644 tests/vhost-user-test.c
>>
>>
>
regards,
Nikolay Nikolaev

[-- Attachment #2: Type: text/html, Size: 12027 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends
  2014-05-20 11:22   ` Nikolay Nikolaev
@ 2014-05-20 12:51     ` Paolo Bonzini
  0 siblings, 0 replies; 33+ messages in thread
From: Paolo Bonzini @ 2014-05-20 12:51 UTC (permalink / raw)
  To: Nikolay Nikolaev
  Cc: snabb-devel, mst, Hu Tao, qemu-devel, Luke Gorrie,
	Antonios Motakis, VirtualOpenSystems Technical Team

Il 20/05/2014 13:22, Nikolay Nikolaev ha scritto:
> I am preparing a new version of this patch series and would like to test
> (or even rebase?) it on top of your patch series. What is its current
> status? Is there a tree that I can track and use for rebases?

Hu Tao has been posting updates lately.  I don't know if he has a git tree.

>     The replacement for the above -mem-path incantation should be the
>     following:
>
>        -object memory-file,id=mem,path=/__hugetlbfs,share=on \
>        -numa node,memdev=mem
>
> Is this command line update still valid with the latest development?

Yes, it should.

Paolo

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-05-20 12:52 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-04 18:22 [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 01/20] Convert -mem-path to QemuOpts and add share property Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 02/20] Add kvm_eventfds_enabled function Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 03/20] Add chardev API qemu_chr_fe_read_all Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 04/20] Add chardev API qemu_chr_fe_set_msgfds Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 05/20] Add chardev API qemu_chr_fe_get_msgfds Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 06/20] Add G_IO_HUP handler for socket chardev Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 07/20] vhost_net should call the poll callback only when it is set Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 08/20] Refactor virtio-net to use generic get_vhost_net Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 09/20] Add new virtio API virtio_queue_get_avail_idx Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 10/20] Gracefully handle ioctl failure in vhost_virtqueue_stop Antonios Motakis
2014-03-04 18:45   ` Michael S. Tsirkin
2014-03-05 13:38     ` Antonios Motakis
2014-03-05 13:47       ` Michael S. Tsirkin
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 11/20] vhost_net_init will use VhostNetOptions to get all its arguments Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 12/20] Add vhost_ops to vhost_dev struct and replace all relevant ioctls Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 13/20] Add mandatory_features to vhost_dev Antonios Motakis
2014-03-04 18:38   ` Michael S. Tsirkin
2014-03-05 13:40     ` Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 14/20] Add vhost-backend and VhostBackendType Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 15/20] Add vhost-user as a vhost backend Antonios Motakis
2014-03-04 18:22 ` [Qemu-devel] [PATCH v9 16/20] Add new vhost-user netdev backend Antonios Motakis
2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 17/20] Add the vhost-user netdev backend to the command line Antonios Motakis
2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 18/20] Add vhost-user protocol documentation Antonios Motakis
2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 19/20] libqemustub: add stubs to be able to use qemu-char.c Antonios Motakis
2014-03-04 18:23 ` [Qemu-devel] [PATCH v9 20/20] Add qtest for vhost-user Antonios Motakis
2014-03-04 18:39   ` Andreas Färber
2014-03-05 13:39     ` Antonios Motakis
2014-03-04 18:29 ` [Qemu-devel] [PATCH v9 00/20] Vhost and vhost-net support for userspace based backends Paolo Bonzini
2014-03-04 18:33   ` Antonios Motakis
2014-03-04 18:38     ` Paolo Bonzini
2014-05-20 11:22   ` Nikolay Nikolaev
2014-05-20 12:51     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).