[Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools
@ 2014-06-20 12:15 David Marchand
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec David Marchand
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: David Marchand @ 2014-06-20 12:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, claudio.fontana, kvm

Hello, 

(as suggested by Paolo, ccing Claudio and kvm mailing list)

Here is a patchset containing an update on ivshmem specs documentation and
importing ivshmem server and client tools.
These tools have been written from scratch and are not related to what is
available in nahanni repository.
I put them in contrib/ directory as the qemu-doc.texi was already telling the
server was supposed to be there.

-- 
David Marchand

David Marchand (2):
  docs: update ivshmem device spec
  contrib: add ivshmem client and server

 contrib/ivshmem-client/Makefile         |   26 ++
 contrib/ivshmem-client/ivshmem-client.c |  418 ++++++++++++++++++++++++++++++
 contrib/ivshmem-client/ivshmem-client.h |  238 ++++++++++++++++++
 contrib/ivshmem-client/main.c           |  246 ++++++++++++++++++
 contrib/ivshmem-server/Makefile         |   26 ++
 contrib/ivshmem-server/ivshmem-server.c |  420 +++++++++++++++++++++++++++++++
 contrib/ivshmem-server/ivshmem-server.h |  185 ++++++++++++++
 contrib/ivshmem-server/main.c           |  296 ++++++++++++++++++++++
 docs/specs/ivshmem_device_spec.txt      |   41 ++-
 qemu-doc.texi                           |   10 +-
 10 files changed, 1897 insertions(+), 9 deletions(-)
 create mode 100644 contrib/ivshmem-client/Makefile
 create mode 100644 contrib/ivshmem-client/ivshmem-client.c
 create mode 100644 contrib/ivshmem-client/ivshmem-client.h
 create mode 100644 contrib/ivshmem-client/main.c
 create mode 100644 contrib/ivshmem-server/Makefile
 create mode 100644 contrib/ivshmem-server/ivshmem-server.c
 create mode 100644 contrib/ivshmem-server/ivshmem-server.h
 create mode 100644 contrib/ivshmem-server/main.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-20 12:15 [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools David Marchand
@ 2014-06-20 12:15 ` David Marchand
  2014-06-23 14:18   ` Claudio Fontana
  2014-06-24 16:09   ` Eric Blake
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 2/2] contrib: add ivshmem client and server David Marchand
  2014-06-23  8:02 ` [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools Claudio Fontana
  2 siblings, 2 replies; 10+ messages in thread
From: David Marchand @ 2014-06-20 12:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, claudio.fontana, kvm

Add some notes on the parts needed to use ivshmem devices: more specifically,
explain the purpose of an ivshmem server and the basic concept to use the
ivshmem devices in guests.

Signed-off-by: David Marchand <david.marchand@6wind.com>
---
 docs/specs/ivshmem_device_spec.txt |   41 ++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt
index 667a862..7d2b73f 100644
--- a/docs/specs/ivshmem_device_spec.txt
+++ b/docs/specs/ivshmem_device_spec.txt
@@ -85,12 +85,41 @@ events have occurred.  The semantics of interrupt vectors are left to the
 user's discretion.
 
 
+IVSHMEM host services
+---------------------
+
+This part is optional (see *Usage in the Guest* section below).
+
+To handle notifications between users of a ivshmem device, a ivshmem server has
+been added. This server is responsible for creating the shared memory and
+creating a set of eventfds for each users of the shared memory. It behaves as a
+proxy between the different ivshmem clients (QEMU): giving the shared memory fd
+to each client, allocating eventfds to new clients and broadcasting to all
+clients when a client disappears.
+
+Apart from the current ivshmem implementation in QEMU, a ivshmem client can be
+written for debug, for development purposes, or to implement notifications
+between host and guests.
+
+
 Usage in the Guest
 ------------------
 
-The shared memory device is intended to be used with the provided UIO driver.
-Very little configuration is needed.  The guest should map BAR0 to access the
-registers (an array of 32-bit ints allows simple writing) and map BAR2 to
-access the shared memory region itself.  The size of the shared memory region
-is specified when the guest (or shared memory server) is started.  A guest may
-map the whole shared memory region or only part of it.
+The guest should map BAR0 to access the registers (an array of 32-bit ints
+allows simple writing) and map BAR2 to access the shared memory region itself.
+The size of the shared memory region is specified when the guest (or shared
+memory server) is started.  A guest may map the whole shared memory region or
+only part of it.
+
+ivshmem provides an optional notification mechanism through eventfds handled by
+QEMU that will trigger interrupts in guests. This mechanism is enabled when
+using a ivshmem-server which must be started prior to VMs and which serves as a
+proxy for exchanging eventfds.
+
+It is your choice how to use the ivshmem device.
+- the simple way, you don't need anything else than what is already in QEMU.
+  You can map the shared memory in guest, then use it in userland as you see fit
+  (memnic for example works this way http://dpdk.org/browse/memnic),
+- the more advanced way, basically, if you want an event mechanism between the
+  VMs using your ivshmem device. In this case, then you will most likely want to
+  write a kernel driver that will handle interrupts.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/2] contrib: add ivshmem client and server
  2014-06-20 12:15 [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools David Marchand
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec David Marchand
@ 2014-06-20 12:15 ` David Marchand
  2014-06-23  8:02 ` [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools Claudio Fontana
  2 siblings, 0 replies; 10+ messages in thread
From: David Marchand @ 2014-06-20 12:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, claudio.fontana, kvm

When using ivshmem devices, notifications between guests can be sent as
interrupts using a ivshmem-server (typical use described in documentation).
The client is provided as a debug tool.

Signed-off-by: David Marchand <david.marchand@6wind.com>
---
 contrib/ivshmem-client/Makefile         |   26 ++
 contrib/ivshmem-client/ivshmem-client.c |  418 ++++++++++++++++++++++++++++++
 contrib/ivshmem-client/ivshmem-client.h |  238 ++++++++++++++++++
 contrib/ivshmem-client/main.c           |  246 ++++++++++++++++++
 contrib/ivshmem-server/Makefile         |   26 ++
 contrib/ivshmem-server/ivshmem-server.c |  420 +++++++++++++++++++++++++++++++
 contrib/ivshmem-server/ivshmem-server.h |  185 ++++++++++++++
 contrib/ivshmem-server/main.c           |  296 ++++++++++++++++++++++
 qemu-doc.texi                           |   10 +-
 9 files changed, 1862 insertions(+), 3 deletions(-)
 create mode 100644 contrib/ivshmem-client/Makefile
 create mode 100644 contrib/ivshmem-client/ivshmem-client.c
 create mode 100644 contrib/ivshmem-client/ivshmem-client.h
 create mode 100644 contrib/ivshmem-client/main.c
 create mode 100644 contrib/ivshmem-server/Makefile
 create mode 100644 contrib/ivshmem-server/ivshmem-server.c
 create mode 100644 contrib/ivshmem-server/ivshmem-server.h
 create mode 100644 contrib/ivshmem-server/main.c

diff --git a/contrib/ivshmem-client/Makefile b/contrib/ivshmem-client/Makefile
new file mode 100644
index 0000000..9e32409
--- /dev/null
+++ b/contrib/ivshmem-client/Makefile
@@ -0,0 +1,26 @@
+# Copyright 2014 6WIND S.A.
+# All rights reserved
+
+S ?= $(CURDIR)
+O ?= $(CURDIR)
+
+CFLAGS += -Wall -Wextra -Werror -g
+LDFLAGS +=
+LDLIBS += -lrt
+
+VPATH = $(S)
+PROG = ivshmem-client
+OBJS := $(O)/ivshmem-client.o
+OBJS += $(O)/main.o
+
+$(O)/%.o: %.c
+	$(CC) $(CFLAGS) -o $@ -c $<
+
+$(O)/$(PROG): $(OBJS)
+	$(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS)
+
+.PHONY: all
+all: $(O)/$(PROG)
+
+clean:
+	rm -f $(OBJS) $(O)/$(PROG)
diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c
new file mode 100644
index 0000000..32ef3ef
--- /dev/null
+++ b/contrib/ivshmem-client/ivshmem-client.c
@@ -0,0 +1,418 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <sys/queue.h>
+
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include "ivshmem-client.h"
+
+/* log a message on stdout if verbose=1 */
+#define debug_log(client, fmt, ...) do { \
+        if ((client)->verbose) {         \
+            printf(fmt, ## __VA_ARGS__); \
+        }                                \
+    } while (0)
+
+/* read message from the unix socket */
+static int
+read_one_msg(struct ivshmem_client *client, long *index, int *fd)
+{
+    int ret;
+    struct msghdr msg;
+    struct iovec iov[1];
+    union {
+        struct cmsghdr cmsg;
+        char control[CMSG_SPACE(sizeof(int))];
+    } msg_control;
+    struct cmsghdr *cmsg;
+
+    iov[0].iov_base = index;
+    iov[0].iov_len = sizeof(*index);
+
+    memset(&msg, 0, sizeof(msg));
+    msg.msg_iov = iov;
+    msg.msg_iovlen = 1;
+    msg.msg_control = &msg_control;
+    msg.msg_controllen = sizeof(msg_control);
+
+    ret = recvmsg(client->sock_fd, &msg, 0);
+    if (ret < 0) {
+        debug_log(client, "cannot read message: %s\n", strerror(errno));
+        return -1;
+    }
+    if (ret == 0) {
+        debug_log(client, "lost connection to server\n");
+        return -1;
+    }
+
+    *fd = -1;
+
+    for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+
+        if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
+            cmsg->cmsg_level != SOL_SOCKET ||
+            cmsg->cmsg_type != SCM_RIGHTS) {
+            continue;
+        }
+
+        memcpy(fd, CMSG_DATA(cmsg), sizeof(*fd));
+    }
+
+    return 0;
+}
+
+/* free a peer when the server advertise a disconnection or when the
+ * client is freed */
+static void
+free_peer(struct ivshmem_client *client, struct ivshmem_client_peer *peer)
+{
+    unsigned vector;
+
+    TAILQ_REMOVE(&client->peer_list, peer, next);
+    for (vector = 0; vector < peer->vectors_count; vector++) {
+        close(peer->vectors[vector]);
+    }
+
+    free(peer);
+}
+
+/* handle message coming from server (new peer, new vectors) */
+static int
+handle_server_msg(struct ivshmem_client *client)
+{
+    struct ivshmem_client_peer *peer;
+    long peer_id;
+    int ret, fd;
+
+    ret = read_one_msg(client, &peer_id, &fd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    /* can return a peer or the local client */
+    peer = ivshmem_client_search_peer(client, peer_id);
+
+    /* delete peer */
+    if (fd == -1) {
+
+        if (peer == NULL || peer == &client->local) {
+            debug_log(client, "receive delete for invalid peer %ld", peer_id);
+            return -1;
+        }
+
+        debug_log(client, "delete peer id = %ld\n", peer_id);
+        free_peer(client, peer);
+        return 0;
+    }
+
+    /* new peer */
+    if (peer == NULL) {
+        peer = malloc(sizeof(*peer));
+        if (peer == NULL) {
+            debug_log(client, "cannot allocate new peer\n");
+            return -1;
+        }
+        memset(peer, 0, sizeof(*peer));
+        peer->id = peer_id;
+        peer->vectors_count = 0;
+        TAILQ_INSERT_TAIL(&client->peer_list, peer, next);
+        debug_log(client, "new peer id = %ld\n", peer_id);
+    }
+
+    /* new vector */
+    debug_log(client, "  new vector %d (fd=%d) for peer id %ld\n",
+              peer->vectors_count, fd, peer->id);
+    peer->vectors[peer->vectors_count] = fd;
+    peer->vectors_count++;
+
+    return 0;
+}
+
+/* init a new ivshmem client */
+int
+ivshmem_client_init(struct ivshmem_client *client, const char *unix_sock_path,
+                    ivshmem_client_notif_cb_t notif_cb, void *notif_arg,
+                    int verbose)
+{
+    unsigned i;
+
+    memset(client, 0, sizeof(*client));
+
+    snprintf(client->unix_sock_path, sizeof(client->unix_sock_path),
+             "%s", unix_sock_path);
+
+    for (i = 0; i < IVSHMEM_CLIENT_MAX_VECTORS; i++) {
+        client->local.vectors[i] = -1;
+    }
+
+    TAILQ_INIT(&client->peer_list);
+    client->local.id = -1;
+
+    client->notif_cb = notif_cb;
+    client->notif_arg = notif_arg;
+    client->verbose = verbose;
+
+    return 0;
+}
+
+/* create and connect to the unix socket */
+int
+ivshmem_client_connect(struct ivshmem_client *client)
+{
+    struct sockaddr_un sun;
+    int fd;
+    long tmp;
+
+    debug_log(client, "connect to client %s\n", client->unix_sock_path);
+
+    client->sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
+    if (client->sock_fd < 0) {
+        debug_log(client, "cannot create socket: %s\n", strerror(errno));
+        return -1;
+    }
+
+    sun.sun_family = AF_UNIX;
+    snprintf(sun.sun_path, sizeof(sun.sun_path), "%s", client->unix_sock_path);
+    if (connect(client->sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) {
+        debug_log(client, "cannot connect to %s: %s\n", sun.sun_path,
+                  strerror(errno));
+        close(client->sock_fd);
+        client->sock_fd = -1;
+        return -1;
+    }
+
+    /* first, we expect our index + a fd == -1 */
+    if (read_one_msg(client, &client->local.id, &fd) < 0 ||
+        client->local.id < 0 || fd != -1) {
+        debug_log(client, "cannot read from server\n");
+        close(client->sock_fd);
+        client->sock_fd = -1;
+        return -1;
+    }
+    debug_log(client, "our_id=%ld\n", client->local.id);
+
+    /* now, we expect shared mem fd + a -1 index, note that shm fd
+     * is not used */
+    if (read_one_msg(client, &tmp, &fd) < 0 ||
+        tmp != -1 || fd < 0) {
+        debug_log(client, "cannot read from server (2)\n");
+        close(client->sock_fd);
+        client->sock_fd = -1;
+        return -1;
+    }
+    debug_log(client, "shm_fd=%d\n", fd);
+
+    return 0;
+}
+
+/* close connection to the server, and free all peer structures */
+void
+ivshmem_client_close(struct ivshmem_client *client)
+{
+    struct ivshmem_client_peer *peer;
+    unsigned i;
+
+    debug_log(client, "close client\n");
+
+    while ((peer = TAILQ_FIRST(&client->peer_list)) != NULL) {
+        free_peer(client, peer);
+    }
+
+    close(client->sock_fd);
+    client->sock_fd = -1;
+    client->local.id = -1;
+    for (i = 0; i < IVSHMEM_CLIENT_MAX_VECTORS; i++) {
+        client->local.vectors[i] = -1;
+    }
+}
+
+/* get the fd_set according to the unix socket and peer list */
+void
+ivshmem_client_get_fds(const struct ivshmem_client *client, fd_set *fds,
+                       int *maxfd)
+{
+    int fd;
+    unsigned vector;
+
+    FD_SET(client->sock_fd, fds);
+    if (client->sock_fd >= *maxfd) {
+        *maxfd = client->sock_fd + 1;
+    }
+
+    for (vector = 0; vector < client->local.vectors_count; vector++) {
+        fd = client->local.vectors[vector];
+        FD_SET(fd, fds);
+        if (fd >= *maxfd) {
+            *maxfd = fd + 1;
+        }
+    }
+}
+
+/* handle events from eventfd: just print a message on notification */
+static int
+handle_event(struct ivshmem_client *client, const fd_set *cur, int maxfd)
+{
+    struct ivshmem_client_peer *peer;
+    uint64_t kick;
+    unsigned i;
+    int ret;
+
+    peer = &client->local;
+
+    for (i = 0; i < peer->vectors_count; i++) {
+        if (peer->vectors[i] >= maxfd || !FD_ISSET(peer->vectors[i], cur)) {
+            continue;
+        }
+
+        ret = read(peer->vectors[i], &kick, sizeof(kick));
+        if (ret < 0) {
+            return ret;
+        }
+        if (ret != sizeof(kick)) {
+            debug_log(client, "invalid read size = %d\n", ret);
+            errno = EINVAL;
+            return -1;
+        }
+        debug_log(client, "received event on fd %d vector %d: %ld\n",
+                  peer->vectors[i], i, kick);
+        if (client->notif_cb != NULL) {
+            client->notif_cb(client, peer, i, client->notif_arg);
+        }
+    }
+
+    return 0;
+}
+
+/* read and handle new messages on the given fd_set */
+int
+ivshmem_client_handle_fds(struct ivshmem_client *client, fd_set *fds, int maxfd)
+{
+    if (client->sock_fd < maxfd && FD_ISSET(client->sock_fd, fds) &&
+        handle_server_msg(client) < 0 && errno != EINTR) {
+        debug_log(client, "handle_server_msg() failed\n");
+        return -1;
+    } else if (handle_event(client, fds, maxfd) < 0 && errno != EINTR) {
+        debug_log(client, "handle_event() failed\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+/* send a notification on a vector of a peer */
+int
+ivshmem_client_notify(const struct ivshmem_client *client,
+                      const struct ivshmem_client_peer *peer, unsigned vector)
+{
+    uint64_t kick;
+    int fd;
+
+    if (vector > peer->vectors_count) {
+        debug_log(client, "invalid vector %u on peer %ld\n", vector, peer->id);
+        return -1;
+    }
+    fd = peer->vectors[vector];
+    debug_log(client, "notify peer %ld on vector %d, fd %d\n", peer->id, vector,
+              fd);
+
+    kick = 1;
+    if (write(fd, &kick, sizeof(kick)) != sizeof(kick)) {
+        fprintf(stderr, "could not write to %d: %s\n", peer->vectors[vector],
+                strerror(errno));
+        return -1;
+    }
+    return 0;
+}
+
+/* send a notification to all vectors of a peer */
+int
+ivshmem_client_notify_all_vects(const struct ivshmem_client *client,
+                                const struct ivshmem_client_peer *peer)
+{
+    unsigned vector;
+    int ret = 0;
+
+    for (vector = 0; vector < peer->vectors_count; vector++) {
+        if (ivshmem_client_notify(client, peer, vector) < 0) {
+            ret = -1;
+        }
+    }
+
+    return ret;
+}
+
+/* send a notification to all peers */
+int
+ivshmem_client_notify_broadcast(const struct ivshmem_client *client)
+{
+    struct ivshmem_client_peer *peer;
+    int ret = 0;
+
+    TAILQ_FOREACH(peer, &client->peer_list, next) {
+        if (ivshmem_client_notify_all_vects(client, peer) < 0) {
+            ret = -1;
+        }
+    }
+
+    return ret;
+}
+
+/* lookup peer from its id */
+struct ivshmem_client_peer *
+ivshmem_client_search_peer(struct ivshmem_client *client, long peer_id)
+{
+    struct ivshmem_client_peer *peer;
+
+    if (peer_id == client->local.id) {
+        return &client->local;
+    }
+
+    TAILQ_FOREACH(peer, &client->peer_list, next) {
+        if (peer->id == peer_id) {
+            return peer;
+        }
+    }
+    return NULL;
+}
+
+/* dump our info, the list of peers their vectors on stdout */
+void
+ivshmem_client_dump(const struct ivshmem_client *client)
+{
+    const struct ivshmem_client_peer *peer;
+    unsigned vector;
+
+    /* dump local infos */
+    peer = &client->local;
+    printf("our_id = %ld\n", peer->id);
+    for (vector = 0; vector < peer->vectors_count; vector++) {
+        printf("  vector %d is enabled (fd=%d)\n", vector,
+               peer->vectors[vector]);
+    }
+
+    /* dump peers */
+    TAILQ_FOREACH(peer, &client->peer_list, next) {
+        printf("peer_id = %ld\n", peer->id);
+
+        for (vector = 0; vector < peer->vectors_count; vector++) {
+            printf("  vector %d is enabled (fd=%d)\n", vector,
+                   peer->vectors[vector]);
+        }
+    }
+}
diff --git a/contrib/ivshmem-client/ivshmem-client.h b/contrib/ivshmem-client/ivshmem-client.h
new file mode 100644
index 0000000..c47fb73
--- /dev/null
+++ b/contrib/ivshmem-client/ivshmem-client.h
@@ -0,0 +1,238 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef _IVSHMEM_CLIENT_
+#define _IVSHMEM_CLIENT_
+
+/**
+ * This file provides helper to implement an ivshmem client. It is used
+ * on the host to ask QEMU to send an interrupt to an ivshmem PCI device in a
+ * guest. QEMU also implements an ivshmem client similar to this one, they both
+ * connect to an ivshmem server.
+ *
+ * A standalone ivshmem client based on this file is provided for debug/test
+ * purposes.
+ */
+
+#include <limits.h>
+#include <sys/select.h>
+#include <sys/queue.h>
+
+/**
+ * Maximum number of notification vectors supported by the client
+ */
+#define IVSHMEM_CLIENT_MAX_VECTORS 64
+
+/**
+ * Structure storing a peer
+ *
+ * Each time a client connects to an ivshmem server, it is advertised to
+ * all connected clients through the unix socket. When our ivshmem
+ * client receives a notification, it creates a ivshmem_client_peer
+ * structure to store the infos of this peer.
+ *
+ * This structure is also used to store the information of our own
+ * client in (struct ivshmem_client)->local.
+ */
+struct ivshmem_client_peer {
+    TAILQ_ENTRY(ivshmem_client_peer) next;    /**< next in list*/
+    long id;                                    /**< the id of the peer */
+    int vectors[IVSHMEM_CLIENT_MAX_VECTORS];  /**< one fd per vector */
+    unsigned vectors_count;                     /**< number of vectors */
+};
+TAILQ_HEAD(ivshmem_client_peer_list, ivshmem_client_peer);
+
+struct ivshmem_client;
+
+/**
+ * Typedef of callback function used when our ivshmem_client receives a
+ * notification from a peer.
+ */
+typedef void (*ivshmem_client_notif_cb_t)(
+    const struct ivshmem_client *client,
+    const struct ivshmem_client_peer *peer,
+    unsigned vect, void *arg);
+
+/**
+ * Structure describing an ivshmem client
+ *
+ * This structure stores all information related to our client: the name
+ * of the server unix socket, the list of peers advertised by the
+ * server, our own client information, and a pointer the notification
+ * callback function used when we receive a notification from a peer.
+ */
+struct ivshmem_client {
+    char unix_sock_path[PATH_MAX];        /**< path to unix sock */
+    int sock_fd;                          /**< unix sock filedesc */
+
+    struct ivshmem_client_peer_list peer_list;  /**< list of peers */
+    struct ivshmem_client_peer local;   /**< our own infos */
+
+    ivshmem_client_notif_cb_t notif_cb; /**< notification callback */
+    void *notif_arg;                      /**< notification argument */
+
+    int verbose;                          /**< true to enable debug */
+};
+
+/**
+ * Initialize an ivshmem client
+ *
+ * @param client
+ *   A pointer to an uninitialized ivshmem_client structure
+ * @param unix_sock_path
+ *   The pointer to the unix socket file name
+ * @param notif_cb
+ *   If not NULL, the pointer to the function to be called when we our
+ *   ivshmem_client receives a notification from a peer
+ * @param notif_arg
+ *   Opaque pointer given as-is to the notification callback function
+ * @param verbose
+ *   True to enable debug
+ *
+ * @return
+ *   0 on success, or a negative value on error
+ */
+int ivshmem_client_init(struct ivshmem_client *client,
+    const char *unix_sock_path, ivshmem_client_notif_cb_t notif_cb,
+    void *notif_arg, int verbose);
+
+/**
+ * Connect to the server
+ *
+ * Connect to the server unix socket, and read the first initial
+ * messages sent by the server, giving the ID of the client and the file
+ * descriptor of the shared memory.
+ *
+ * @param client
+ *   The ivshmem client
+ *
+ * @return
+ *   0 on success, or a negative value on error
+ */
+int ivshmem_client_connect(struct ivshmem_client *client);
+
+/**
+ * Close connection to the server and free all peer structures
+ *
+ * @param client
+ *   The ivshmem client
+ */
+void ivshmem_client_close(struct ivshmem_client *client);
+
+/**
+ * Fill a fd_set with file descriptors to be monitored
+ *
+ * This function will fill a fd_set with all file descriptors
+ * that must be polled (unix server socket and peers eventfd). The
+ * function will not initialize the fd_set, it is up to the caller
+ * to do this.
+ *
+ * @param client
+ *   The ivshmem client
+ * @param fds
+ *   The fd_set to be updated
+ * @param maxfd
+ *   Must be set to the max file descriptor + 1 in fd_set. This value is
+ *   updated if this function adds a greated fd in fd_set.
+ */
+void ivshmem_client_get_fds(const struct ivshmem_client *client, fd_set *fds,
+                            int *maxfd);
+
+/**
+ * Read and handle new messages
+ *
+ * Given a fd_set filled by select(), handle incoming messages from
+ * server or peers.
+ *
+ * @param client
+ *   The ivshmem client
+ * @param fds
+ *   The fd_set containing the file descriptors to be checked. Note
+ *   that file descriptors that are not related to our client are
+ *   ignored.
+ * @param maxfd
+ *   The maximum fd in fd_set, plus one.
+  *
+ * @return
+ *   0 on success, negative value on failure.
+ */
+int ivshmem_client_handle_fds(struct ivshmem_client *client, fd_set *fds,
+    int maxfd);
+
+/**
+ * Send a notification to a vector of a peer
+ *
+ * @param client
+ *   The ivshmem client
+ * @param peer
+ *   The peer to be notified
+ * @param vector
+ *   The number of the vector
+ *
+ * @return
+ *   0 on success, and a negative error on failure.
+ */
+int ivshmem_client_notify(const struct ivshmem_client *client,
+    const struct ivshmem_client_peer *peer, unsigned vector);
+
+/**
+ * Send a notification to all vectors of a peer
+ *
+ * @param client
+ *   The ivshmem client
+ * @param peer
+ *   The peer to be notified
+ *
+ * @return
+ *   0 on success, and a negative error on failure (at least one
+ *   notification failed).
+ */
+int ivshmem_client_notify_all_vects(const struct ivshmem_client *client,
+    const struct ivshmem_client_peer *peer);
+
+/**
+ * Broadcat a notification to all vectors of all peers
+ *
+ * @param client
+ *   The ivshmem client
+ *
+ * @return
+ *   0 on success, and a negative error on failure (at least one
+ *   notification failed).
+ */
+int ivshmem_client_notify_broadcast(const struct ivshmem_client *client);
+
+/**
+ * Search a peer from its identifier
+ *
+ * Return the peer structure from its peer_id. If the given peer_id is
+ * the local id, the function returns the local peer structure.
+ *
+ * @param client
+ *   The ivshmem client
+ * @param peer_id
+ *   The identifier of the peer structure
+ *
+ * @return
+ *   The peer structure, or NULL if not found
+ */
+struct ivshmem_client_peer *
+ivshmem_client_search_peer(struct ivshmem_client *client, long peer_id);
+
+/**
+ * Dump information of this ivshmem client on stdout
+ *
+ * Dump the id and the vectors of the given ivshmem client and the list
+ * of its peers and their vectors on stdout.
+ *
+ * @param client
+ *   The ivshmem client
+ */
+void ivshmem_client_dump(const struct ivshmem_client *client);
+
+#endif /* _IVSHMEM_CLIENT_ */
diff --git a/contrib/ivshmem-client/main.c b/contrib/ivshmem-client/main.c
new file mode 100644
index 0000000..04ad158
--- /dev/null
+++ b/contrib/ivshmem-client/main.c
@@ -0,0 +1,246 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <getopt.h>
+
+#include "ivshmem-client.h"
+
+#define DEFAULT_VERBOSE        0
+#define DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
+
+struct ivshmem_client_args {
+    int verbose;
+    char *unix_sock_path;
+};
+
+/* show usage and exit with given error code */
+static void
+usage(const char *name, int code)
+{
+    fprintf(stderr, "%s [opts]\n", name);
+    fprintf(stderr, "  -h: show this help\n");
+    fprintf(stderr, "  -v: verbose mode\n");
+    fprintf(stderr, "  -S <unix_sock_path>: path to the unix socket\n"
+                    "     to listen to.\n"
+                    "     default=%s\n", DEFAULT_UNIX_SOCK_PATH);
+    exit(code);
+}
+
+/* parse the program arguments, exit on error */
+static void
+parse_args(struct ivshmem_client_args *args, int argc, char *argv[])
+{
+    char c;
+
+    while ((c = getopt(argc, argv,
+                       "h"  /* help */
+                       "v"  /* verbose */
+                       "S:" /* unix_sock_path */
+                      )) != -1) {
+
+        switch (c) {
+        case 'h': /* help */
+            usage(argv[0], 0);
+            break;
+
+        case 'v': /* verbose */
+            args->verbose = 1;
+            break;
+
+        case 'S': /* unix_sock_path */
+            args->unix_sock_path = strdup(optarg);
+            break;
+
+        default:
+            usage(argv[0], 1);
+            break;
+        }
+    }
+}
+
+/* show command line help */
+static void
+cmdline_help(void)
+{
+    printf("dump: dump peers (including us)\n"
+           "int <peer> <vector>: notify one vector on a peer\n"
+           "int <peer> all: notify all vectors of a peer\n"
+           "int all: notify all vectors of all peers (excepting us)\n");
+}
+
+/* read stdin and handle commands */
+static int
+handle_stdin_command(struct ivshmem_client *client)
+{
+    struct ivshmem_client_peer *peer;
+    char buf[128];
+    char *s, *token;
+    int ret;
+    int peer_id, vector;
+
+    memset(buf, 0, sizeof(buf));
+    ret = read(0, buf, sizeof(buf) - 1);
+    if (ret < 0) {
+        return -1;
+    }
+
+    s = buf;
+    while ((token = strsep(&s, "\n\r;")) != NULL) {
+        if (!strcmp(token, "")) {
+            continue;
+        }
+        if (!strcmp(token, "?")) {
+            cmdline_help();
+        }
+        if (!strcmp(token, "help")) {
+            cmdline_help();
+        } else if (!strcmp(token, "dump")) {
+            ivshmem_client_dump(client);
+        } else if (!strcmp(token, "int all")) {
+            ivshmem_client_notify_broadcast(client);
+        } else if (sscanf(token, "int %d %d", &peer_id, &vector) == 2) {
+            peer = ivshmem_client_search_peer(client, peer_id);
+            if (peer == NULL) {
+                printf("cannot find peer_id = %d\n", peer_id);
+                continue;
+            }
+            ivshmem_client_notify(client, peer, vector);
+        } else if (sscanf(token, "int %d all", &peer_id) == 1) {
+            peer = ivshmem_client_search_peer(client, peer_id);
+            if (peer == NULL) {
+                printf("cannot find peer_id = %d\n", peer_id);
+                continue;
+            }
+            ivshmem_client_notify_all_vects(client, peer);
+        } else {
+            printf("invalid command, type help\n");
+        }
+    }
+
+    printf("cmd> ");
+    fflush(stdout);
+    return 0;
+}
+
+/* listen on stdin (command line), on unix socket (notifications of new
+ * and dead peers), and on eventfd (IRQ request) */
+int
+poll_events(struct ivshmem_client *client)
+{
+    fd_set fds;
+    int ret, maxfd;
+
+    while (1) {
+
+        FD_ZERO(&fds);
+        FD_SET(0, &fds); /* add stdin in fd_set */
+        maxfd = 1;
+
+        ivshmem_client_get_fds(client, &fds, &maxfd);
+
+        ret = select(maxfd, &fds, NULL, NULL, NULL);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            }
+
+            fprintf(stderr, "select error: %s\n", strerror(errno));
+            break;
+        }
+        if (ret == 0) {
+            continue;
+        }
+
+        if (FD_ISSET(0, &fds) &&
+            handle_stdin_command(client) < 0 && errno != EINTR) {
+            fprintf(stderr, "handle_stdin_command() failed\n");
+            break;
+        }
+
+        if (ivshmem_client_handle_fds(client, &fds, maxfd) < 0) {
+            fprintf(stderr, "ivshmem_client_handle_fds() failed\n");
+            break;
+        }
+    }
+
+    return ret;
+}
+
+/* callback when we receive a notification (just display it) */
+void
+notification_cb(const struct ivshmem_client *client,
+                const struct ivshmem_client_peer *peer, unsigned vect,
+                void *arg)
+{
+    (void)client;
+    (void)arg;
+    printf("receive notification from peer_id=%ld vector=%d\n", peer->id, vect);
+}
+
+int
+main(int argc, char *argv[])
+{
+    struct sigaction sa;
+    struct ivshmem_client client;
+    struct ivshmem_client_args args = {
+        .verbose = DEFAULT_VERBOSE,
+        .unix_sock_path = DEFAULT_UNIX_SOCK_PATH,
+    };
+
+    /* parse arguments, will exit on error */
+    parse_args(&args, argc, argv);
+
+    /* Ignore SIGPIPE, see this link for more info:
+     * http://www.mail-archive.com/libevent-users@monkey.org/msg01606.html */
+    sa.sa_handler = SIG_IGN;
+    sa.sa_flags = 0;
+    if (sigemptyset(&sa.sa_mask) == -1 ||
+        sigaction(SIGPIPE, &sa, 0) == -1) {
+        perror("failed to ignore SIGPIPE; sigaction");
+        return 1;
+    }
+
+    cmdline_help();
+    printf("cmd> ");
+    fflush(stdout);
+
+    if (ivshmem_client_init(&client, args.unix_sock_path, notification_cb,
+                            NULL, args.verbose) < 0) {
+        fprintf(stderr, "cannot init client\n");
+        return 1;
+    }
+
+    while (1) {
+        if (ivshmem_client_connect(&client) < 0) {
+            fprintf(stderr, "cannot connect to server, retry in 1 second\n");
+            sleep(1);
+            continue;
+        }
+
+        fprintf(stdout, "listen on server socket %d\n", client.sock_fd);
+
+        if (poll_events(&client) == 0) {
+            continue;
+        }
+
+        /* disconnected from server, reset all peers */
+        fprintf(stdout, "disconnected from server\n");
+
+        ivshmem_client_close(&client);
+    }
+
+    return 0;
+}
diff --git a/contrib/ivshmem-server/Makefile b/contrib/ivshmem-server/Makefile
new file mode 100644
index 0000000..954eba8
--- /dev/null
+++ b/contrib/ivshmem-server/Makefile
@@ -0,0 +1,26 @@
+# Copyright 2014 6WIND S.A.
+# All rights reserved
+
+S ?= $(CURDIR)
+O ?= $(CURDIR)
+
+CFLAGS += -Wall -Wextra -Werror -g
+LDFLAGS +=
+LDLIBS += -lrt
+
+VPATH = $(S)
+PROG = ivshmem-server
+OBJS := $(O)/ivshmem-server.o
+OBJS += $(O)/main.o
+
+$(O)/%.o: %.c
+	$(CC) $(CFLAGS) -o $@ -c $<
+
+$(O)/$(PROG): $(OBJS)
+	$(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS)
+
+.PHONY: all
+all: $(O)/$(PROG)
+
+clean:
+	rm -f $(OBJS) $(O)/$(PROG)
diff --git a/contrib/ivshmem-server/ivshmem-server.c b/contrib/ivshmem-server/ivshmem-server.c
new file mode 100644
index 0000000..b10b08a
--- /dev/null
+++ b/contrib/ivshmem-server/ivshmem-server.c
@@ -0,0 +1,420 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/eventfd.h>
+
+#include "ivshmem-server.h"
+
+/* log a message on stdout if verbose=1 */
+#define debug_log(server, fmt, ...) do { \
+        if ((server)->verbose) {         \
+            printf(fmt, ## __VA_ARGS__); \
+        }                                \
+    } while (0)
+
+/* browse the queue, allowing to remove/free the current element */
+#define    TAILQ_FOREACH_SAFE(var, var2, head, field)            \
+    for ((var) = TAILQ_FIRST((head)),                            \
+             (var2) = ((var) ? TAILQ_NEXT((var), field) : NULL); \
+         (var);                                                  \
+         (var) = (var2),                                         \
+             (var2) = ((var2) ? TAILQ_NEXT((var2), field) : NULL))
+
+/** maximum size of a huge page, used by ivshmem_ftruncate() */
+#define MAX_HUGEPAGE_SIZE (1024 * 1024 * 1024)
+
+/** default listen backlog (number of sockets not accepted) */
+#define IVSHMEM_SERVER_LISTEN_BACKLOG 10
+
+/* send message to a client unix socket */
+static int
+send_one_msg(int sock_fd, long peer_id, int fd)
+{
+    int ret;
+    struct msghdr msg;
+    struct iovec iov[1];
+    union {
+        struct cmsghdr cmsg;
+        char control[CMSG_SPACE(sizeof(int))];
+    } msg_control;
+    struct cmsghdr *cmsg;
+
+    iov[0].iov_base = &peer_id;
+    iov[0].iov_len = sizeof(peer_id);
+
+    memset(&msg, 0, sizeof(msg));
+    msg.msg_iov = iov;
+    msg.msg_iovlen = 1;
+
+    /* if fd is specified, add it in a cmsg */
+    if (fd >= 0) {
+        msg.msg_control = &msg_control;
+        msg.msg_controllen = sizeof(msg_control);
+        cmsg = CMSG_FIRSTHDR(&msg);
+        cmsg->cmsg_level = SOL_SOCKET;
+        cmsg->cmsg_type = SCM_RIGHTS;
+        cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+        memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
+    }
+
+    ret = sendmsg(sock_fd, &msg, 0);
+    if (ret <= 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+/* free a peer when the server advertise a disconnection or when the
+ * server is freed */
+static void
+free_peer(struct ivshmem_server *server, struct ivshmem_server_peer *peer)
+{
+    unsigned vector;
+    struct ivshmem_server_peer *other_peer;
+
+    debug_log(server, "free peer %ld\n", peer->id);
+    close(peer->sock_fd);
+    TAILQ_REMOVE(&server->peer_list, peer, next);
+
+    /* advertise the deletion to other peers */
+    TAILQ_FOREACH(other_peer, &server->peer_list, next) {
+        send_one_msg(other_peer->sock_fd, peer->id, -1);
+    }
+
+    for (vector = 0; vector < peer->vectors_count; vector++) {
+        close(peer->vectors[vector]);
+    }
+
+    free(peer);
+}
+
+/* send the peer id and the shm_fd just after a new client connection */
+static int
+send_initial_info(struct ivshmem_server *server,
+                  struct ivshmem_server_peer *peer)
+{
+    int ret;
+
+    /* send the peer id to the client */
+    ret = send_one_msg(peer->sock_fd, peer->id, -1);
+    if (ret < 0) {
+        debug_log(server, "cannot send peer id: %s\n", strerror(errno));
+        return -1;
+    }
+
+    /* send the shm_fd */
+    ret = send_one_msg(peer->sock_fd, -1, server->shm_fd);
+    if (ret < 0) {
+        debug_log(server, "cannot send shm fd: %s\n", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/* handle message on listening unix socket (new client connection) */
+static int
+handle_new_conn(struct ivshmem_server *server)
+{
+    struct ivshmem_server_peer *peer, *other_peer;
+    struct sockaddr_un unaddr;
+    socklen_t unaddr_len;
+    int newfd;
+    unsigned i;
+
+    /* accept the incoming connection */
+    unaddr_len = sizeof(unaddr);
+    newfd = accept(server->sock_fd, (struct sockaddr *)&unaddr, &unaddr_len);
+    if (newfd < 0) {
+        debug_log(server, "cannot accept() %s\n", strerror(errno));
+        return -1;
+    }
+
+    debug_log(server, "accept()=%d\n", newfd);
+
+    /* allocate new structure for this peer */
+    peer = malloc(sizeof(*peer));
+    if (peer == NULL) {
+        debug_log(server, "cannot allocate new peer\n");
+        close(newfd);
+        return -1;
+    }
+
+    /* initialize the peer struct, one eventfd per vector */
+    memset(peer, 0, sizeof(*peer));
+    peer->sock_fd = newfd;
+
+    /* get an unused peer id */
+    while (ivshmem_server_search_peer(server, server->cur_id) != NULL) {
+        server->cur_id++;
+    }
+    peer->id = server->cur_id++;
+
+    /* create eventfd, one per vector */
+    peer->vectors_count = server->n_vectors;
+    for (i = 0; i < peer->vectors_count; i++) {
+        peer->vectors[i] = eventfd(0, 0);
+        if (peer->vectors[i] < 0) {
+            debug_log(server, "cannot create eventfd\n");
+            goto fail;
+        }
+    }
+
+    /* send peer id and shm fd */
+    if (send_initial_info(server, peer) < 0) {
+        debug_log(server, "cannot send initial info\n");
+        goto fail;
+    }
+
+    /* advertise the new peer to others */
+    TAILQ_FOREACH(other_peer, &server->peer_list, next) {
+        for (i = 0; i < peer->vectors_count; i++) {
+            send_one_msg(other_peer->sock_fd, peer->id, peer->vectors[i]);
+        }
+    }
+
+    /* advertise the other peers to the new one */
+    TAILQ_FOREACH(other_peer, &server->peer_list, next) {
+        for (i = 0; i < peer->vectors_count; i++) {
+            send_one_msg(peer->sock_fd, other_peer->id, other_peer->vectors[i]);
+        }
+    }
+
+    /* advertise the new peer to itself */
+    for (i = 0; i < peer->vectors_count; i++) {
+        send_one_msg(peer->sock_fd, peer->id, peer->vectors[i]);
+    }
+
+    TAILQ_INSERT_TAIL(&server->peer_list, peer, next);
+    debug_log(server, "new peer id = %ld\n", peer->id);
+    return 0;
+
+fail:
+    while (i--) {
+        close(peer->vectors[i]);
+    }
+    peer->sock_fd = -1;
+    close(newfd);
+    return -1;
+}
+
+/* Try to ftruncate a file to next power of 2 of shmsize.
+ * If it fails; all power of 2 above shmsize are tested until
+ * we reach the maximum huge page size. This is useful
+ * if the shm file is in a hugetlbfs that cannot be truncated to the
+ * shm_size value. */
+static int
+ivshmem_ftruncate(int fd, unsigned shmsize)
+{
+    int ret;
+
+    /* align shmsize to next power of 2 */
+    shmsize--;
+    shmsize |= shmsize >> 1;
+    shmsize |= shmsize >> 2;
+    shmsize |= shmsize >> 4;
+    shmsize |= shmsize >> 8;
+    shmsize |= shmsize >> 16;
+    shmsize++;
+
+    while (shmsize <= MAX_HUGEPAGE_SIZE) {
+        ret = ftruncate(fd, shmsize);
+        if (ret == 0) {
+            return ret;
+        }
+        shmsize *= 2;
+    }
+
+    return -1;
+}
+
+/* Init a new ivshmem server */
+int
+ivshmem_server_init(struct ivshmem_server *server, const char *unix_sock_path,
+                    const char *shm_path, size_t shm_size, unsigned n_vectors,
+                    int verbose)
+{
+    memset(server, 0, sizeof(*server));
+
+    snprintf(server->unix_sock_path, sizeof(server->unix_sock_path),
+             "%s", unix_sock_path);
+    snprintf(server->shm_path, sizeof(server->shm_path),
+             "%s", shm_path);
+
+    server->shm_size = shm_size;
+    server->n_vectors = n_vectors;
+    server->verbose = verbose;
+
+    TAILQ_INIT(&server->peer_list);
+
+    return 0;
+}
+
+/* open shm, create and bind to the unix socket */
+int
+ivshmem_server_start(struct ivshmem_server *server)
+{
+    struct sockaddr_un sun;
+    int shm_fd, sock_fd;
+
+    /* open shm file */
+    shm_fd = shm_open(server->shm_path, O_CREAT|O_RDWR, S_IRWXU);
+    if (shm_fd < 0) {
+        fprintf(stderr, "cannot open shm file %s: %s\n", server->shm_path,
+                strerror(errno));
+        return -1;
+    }
+    if (ivshmem_ftruncate(shm_fd, server->shm_size) < 0) {
+        fprintf(stderr, "ftruncate(%s) failed: %s\n", server->shm_path,
+                strerror(errno));
+        return -1;
+    }
+
+    debug_log(server, "create & bind socket %s\n", server->unix_sock_path);
+
+    /* create the unix listening socket */
+    sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
+    if (sock_fd < 0) {
+        debug_log(server, "cannot create socket: %s\n", strerror(errno));
+        close(shm_fd);
+        return -1;
+    }
+
+    sun.sun_family = AF_UNIX;
+    snprintf(sun.sun_path, sizeof(sun.sun_path), "%s", server->unix_sock_path);
+    unlink(sun.sun_path);
+    if (bind(sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) {
+        debug_log(server, "cannot connect to %s: %s\n", sun.sun_path,
+                  strerror(errno));
+        close(sock_fd);
+        close(shm_fd);
+        return -1;
+    }
+
+    if (listen(sock_fd, IVSHMEM_SERVER_LISTEN_BACKLOG) < 0) {
+        debug_log(server, "listen() failed: %s\n", strerror(errno));
+        close(sock_fd);
+        close(shm_fd);
+        return -1;
+    }
+
+    server->sock_fd = sock_fd;
+    server->shm_fd = shm_fd;
+
+    return 0;
+}
+
+/* close connections to clients, the unix socket and the shm fd */
+void
+ivshmem_server_close(struct ivshmem_server *server)
+{
+    struct ivshmem_server_peer *peer;
+
+    debug_log(server, "close server\n");
+
+    TAILQ_FOREACH(peer, &server->peer_list, next) {
+        free_peer(server, peer);
+    }
+
+    close(server->sock_fd);
+    close(server->shm_fd);
+    server->sock_fd = -1;
+    server->shm_fd = -1;
+}
+
+/* get the fd_set according to the unix socket and the peer list */
+void
+ivshmem_server_get_fds(const struct ivshmem_server *server, fd_set *fds,
+                       int *maxfd)
+{
+    struct ivshmem_server_peer *peer;
+
+    FD_SET(server->sock_fd, fds);
+    if (server->sock_fd >= *maxfd) {
+        *maxfd = server->sock_fd + 1;
+    }
+
+    TAILQ_FOREACH(peer, &server->peer_list, next) {
+        FD_SET(peer->sock_fd, fds);
+        if (peer->sock_fd >= *maxfd) {
+            *maxfd = peer->sock_fd + 1;
+        }
+    }
+}
+
+/* process incoming messages on the sockets in fd_set */
+int
+ivshmem_server_handle_fds(struct ivshmem_server *server, fd_set *fds, int maxfd)
+{
+    struct ivshmem_server_peer *peer, *peer_next;
+
+    if (server->sock_fd < maxfd && FD_ISSET(server->sock_fd, fds) &&
+        handle_new_conn(server) < 0 && errno != EINTR) {
+        debug_log(server, "handle_new_conn() failed\n");
+        return -1;
+    }
+
+    TAILQ_FOREACH_SAFE(peer, peer_next, &server->peer_list, next) {
+        /* any message from a peer socket result in a close() */
+        debug_log(server, "peer->sock_fd=%d\n", peer->sock_fd);
+        if (peer->sock_fd < maxfd && FD_ISSET(peer->sock_fd, fds)) {
+            free_peer(server, peer);
+        }
+    }
+
+    return 0;
+}
+
+/* lookup peer from its id */
+struct ivshmem_server_peer *
+ivshmem_server_search_peer(struct ivshmem_server *server, long peer_id)
+{
+    struct ivshmem_server_peer *peer;
+
+    TAILQ_FOREACH(peer, &server->peer_list, next) {
+        if (peer->id == peer_id) {
+            return peer;
+        }
+    }
+    return NULL;
+}
+
+/* dump our info, the list of peers their vectors on stdout */
+void
+ivshmem_server_dump(const struct ivshmem_server *server)
+{
+    const struct ivshmem_server_peer *peer;
+    unsigned vector;
+
+    /* dump peers */
+    TAILQ_FOREACH(peer, &server->peer_list, next) {
+        printf("peer_id = %ld\n", peer->id);
+
+        for (vector = 0; vector < peer->vectors_count; vector++) {
+            printf("  vector %d is enabled (fd=%d)\n", vector,
+                   peer->vectors[vector]);
+        }
+    }
+}
diff --git a/contrib/ivshmem-server/ivshmem-server.h b/contrib/ivshmem-server/ivshmem-server.h
new file mode 100644
index 0000000..bb56cea
--- /dev/null
+++ b/contrib/ivshmem-server/ivshmem-server.h
@@ -0,0 +1,185 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef _IVSHMEM_SERVER_
+#define _IVSHMEM_SERVER_
+
+/**
+ * The ivshmem server is a daemon that creates a unix socket in listen
+ * mode. The ivshmem clients (qemu or ivshmem-client) connect to this
+ * unix socket. For each client, the server will create some eventfd
+ * (see EVENTFD(2)), one per vector. These fd are transmitted to all
+ * clients using the SCM_RIGHTS cmsg message. Therefore, each client is
+ * able to send a notification to another client without beeing
+ * "profixied" by the server.
+ *
+ * We use this mechanism to send interruptions between guests.
+ * qemu is able to transform an event on a eventfd into a PCI MSI-x
+ * interruption in the guest.
+ *
+ * The ivshmem server is also able to share the file descriptor
+ * associated to the ivshmem shared memory.
+ */
+
+#include <limits.h>
+#include <sys/select.h>
+#include <sys/queue.h>
+
+/**
+ * Maximum number of notification vectors supported by the server
+ */
+#define IVSHMEM_SERVER_MAX_VECTORS 64
+
+/**
+ * Structure storing a peer
+ *
+ * Each time a client connects to an ivshmem server, a new
+ * ivshmem_server_peer structure is created. This peer and all its
+ * vectors are advertised to all connected clients through the connected
+ * unix sockets.
+ */
+struct ivshmem_server_peer {
+    TAILQ_ENTRY(ivshmem_server_peer) next;    /**< next in list*/
+    int sock_fd;                                /**< connected unix sock */
+    long id;                                    /**< the id of the peer */
+    int vectors[IVSHMEM_SERVER_MAX_VECTORS];  /**< one fd per vector */
+    unsigned vectors_count;                     /**< number of vectors */
+};
+TAILQ_HEAD(ivshmem_server_peer_list, ivshmem_server_peer);
+
+/**
+ * Structure describing an ivshmem server
+ *
+ * This structure stores all information related to our server: the name
+ * of the server unix socket and the list of connected peers.
+ */
+struct ivshmem_server {
+    char unix_sock_path[PATH_MAX];  /**< path to unix socket */
+    int sock_fd;                    /**< unix sock file descriptor */
+    char shm_path[PATH_MAX];        /**< path to shm */
+    size_t shm_size;                /**< size of shm */
+    int shm_fd;                     /**< shm file descriptor */
+    unsigned n_vectors;             /**< number of vectors */
+    long cur_id;                    /**< id to be given to next client */
+    int verbose;                    /**< true in verbose mode */
+    struct ivshmem_server_peer_list peer_list;  /**< list of peers */
+};
+
+/**
+ * Initialize an ivshmem server
+ *
+ * @param server
+ *   A pointer to an uninitialized ivshmem_server structure
+ * @param unix_sock_path
+ *   The pointer to the unix socket file name
+ * @param shm_path
+ *   Path to the shared memory. The path corresponds to a POSIX shm name.
+ *   To use a real file, for instance in a hugetlbfs, it is possible to
+ *   use /../../abspath/to/file.
+ * @param shm_size
+ *   Size of shared memory
+ * @param n_vectors
+ *   Number of interrupt vectors per client
+ * @param verbose
+ *   True to enable verbose mode
+ *
+ * @return
+ *   0 on success, negative value on error
+ */
+int
+ivshmem_server_init(struct ivshmem_server *server,
+    const char *unix_sock_path, const char *shm_path, size_t shm_size,
+    unsigned n_vectors, int verbose);
+
+/**
+ * Open the shm, then create and bind to the unix socket
+ *
+ * @param server
+ *   The pointer to the initialized ivshmem server structure
+ *
+ * @return
+ *   0 on success, or a negative value on error
+ */
+int ivshmem_server_start(struct ivshmem_server *server);
+
+/**
+ * Close the server
+ *
+ * Close connections to all clients, close the unix socket and the
+ * shared memory file descriptor. The structure remains initialized, so
+ * it is possible to call ivshmem_server_start() again after a call to
+ * ivshmem_server_close().
+ *
+ * @param server
+ *   The ivshmem server
+ */
+void ivshmem_server_close(struct ivshmem_server *server);
+
+/**
+ * Fill a fd_set with file descriptors to be monitored
+ *
+ * This function will fill a fd_set with all file descriptors that must
+ * be polled (unix server socket and peers unix socket). The function
+ * will not initialize the fd_set, it is up to the caller to do it.
+ *
+ * @param server
+ *   The ivshmem server
+ * @param fds
+ *   The fd_set to be updated
+ * @param maxfd
+ *   Must be set to the max file descriptor + 1 in fd_set. This value is
+ *   updated if this function adds a greated fd in fd_set.
+ */
+void
+ivshmem_server_get_fds(const struct ivshmem_server *server,
+    fd_set *fds, int *maxfd);
+
+/**
+ * Read and handle new messages
+ *
+ * Given a fd_set (for instance filled by a call to select()), handle
+ * incoming messages from peers.
+ *
+ * @param server
+ *   The ivshmem server
+ * @param fds
+ *   The fd_set containing the file descriptors to be checked. Note
+ *   that file descriptors that are not related to our server are
+ *   ignored.
+ * @param maxfd
+ *   The maximum fd in fd_set, plus one.
+ *
+ * @return
+ *   0 on success, negative value on failure.
+ */
+int ivshmem_server_handle_fds(struct ivshmem_server *server, fd_set *fds,
+    int maxfd);
+
+/**
+ * Search a peer from its identifier
+ *
+ * @param server
+ *   The ivshmem server
+ * @param peer_id
+ *   The identifier of the peer structure
+ *
+ * @return
+ *   The peer structure, or NULL if not found
+ */
+struct ivshmem_server_peer *
+ivshmem_server_search_peer(struct ivshmem_server *server, long peer_id);
+
+/**
+ * Dump information of this ivshmem server and its peers on stdout
+ *
+ * @param server
+ *   The ivshmem server
+ */
+void ivshmem_server_dump(const struct ivshmem_server *server);
+
+#endif /* _IVSHMEM_SERVER_ */
diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c
new file mode 100644
index 0000000..392000a
--- /dev/null
+++ b/contrib/ivshmem-server/main.c
@@ -0,0 +1,296 @@
+/*
+ * Copyright(c) 2014 6WIND S.A.
+ * All rights reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+#include <inttypes.h>
+#include <sys/types.h>
+#include <limits.h>
+#include <getopt.h>
+
+#include "ivshmem-server.h"
+
+#define DEFAULT_VERBOSE        0
+#define DEFAULT_FOREGROUND     0
+#define DEFAULT_PID_FILE       "/var/run/ivshmem-server.pid"
+#define DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
+#define DEFAULT_SHM_PATH       "ivshmem"
+#define DEFAULT_SHM_SIZE       (1024*1024)
+#define DEFAULT_N_VECTORS      16
+
+/* arguments given by the user */
+struct ivshmem_server_args {
+    int verbose;
+    int foreground;
+    char *pid_file;
+    char *unix_socket_path;
+    char *shm_path;
+    size_t shm_size;
+    unsigned n_vectors;
+};
+
+/* show usage and exit with given error code */
+static void
+usage(const char *name, int code)
+{
+    fprintf(stderr, "%s [opts]\n", name);
+    fprintf(stderr, "  -h: show this help\n");
+    fprintf(stderr, "  -v: verbose mode\n");
+    fprintf(stderr, "  -F: foreground mode (default is to daemonize)\n");
+    fprintf(stderr, "  -p <pid_file>: path to the PID file (used in daemon\n"
+                    "     mode only).\n"
+                    "     Default=%s\n", DEFAULT_SHM_PATH);
+    fprintf(stderr, "  -S <unix_socket_path>: path to the unix socket\n"
+                    "     to listen to.\n"
+                    "     Default=%s\n", DEFAULT_UNIX_SOCK_PATH);
+    fprintf(stderr, "  -m <shm_path>: path to the shared memory.\n"
+                    "     The path corresponds to a POSIX shm name. To use a\n"
+                    "     real file, for instance in a hugetlbfs, use\n"
+                    "     /../../abspath/to/file.\n"
+                    "     default=%s\n", DEFAULT_SHM_PATH);
+    fprintf(stderr, "  -l <size>: size of shared memory in bytes. The suffix\n"
+                    "     K, M and G can be used (ex: 1K means 1024).\n"
+                    "     default=%u\n", DEFAULT_SHM_SIZE);
+    fprintf(stderr, "  -n <n_vects>: number of vectors.\n"
+                    "     default=%u\n", DEFAULT_N_VECTORS);
+
+    exit(code);
+}
+
+/* parse the size of shm */
+static int
+parse_size(const char *val_str, size_t *val)
+{
+    char *endptr;
+    unsigned long long tmp;
+
+    errno = 0;
+    tmp = strtoull(val_str, &endptr, 0);
+    if ((errno == ERANGE && tmp == ULLONG_MAX) || (errno != 0 && tmp == 0)) {
+        return -1;
+    }
+    if (endptr == val_str) {
+        return -1;
+    }
+    if (endptr[0] == 'K' && endptr[1] == '\0') {
+        tmp *= 1024;
+    } else if (endptr[0] == 'M' && endptr[1] == '\0') {
+        tmp *= 1024 * 1024;
+    } else if (endptr[0] == 'G' && endptr[1] == '\0') {
+        tmp *= 1024 * 1024 * 1024;
+    } else if (endptr[0] != '\0') {
+        return -1;
+    }
+
+    *val = tmp;
+    return 0;
+}
+
+/* parse an unsigned int */
+static int
+parse_uint(const char *val_str, unsigned *val)
+{
+    char *endptr;
+    unsigned long tmp;
+
+    errno = 0;
+    tmp = strtoul(val_str, &endptr, 0);
+    if ((errno == ERANGE && tmp == ULONG_MAX) || (errno != 0 && tmp == 0)) {
+        return -1;
+    }
+    if (endptr == val_str || endptr[0] != '\0') {
+        return -1;
+    }
+    *val = tmp;
+    return 0;
+}
+
+/* parse the program arguments, exit on error */
+static void
+parse_args(struct ivshmem_server_args *args, int argc, char *argv[])
+{
+    char c;
+
+    while ((c = getopt(argc, argv,
+                       "h"  /* help */
+                       "v"  /* verbose */
+                       "F"  /* foreground */
+                       "p:" /* pid_file */
+                       "S:" /* unix_socket_path */
+                       "m:" /* shm_path */
+                       "l:" /* shm_size */
+                       "n:" /* n_vectors */
+                      )) != -1) {
+
+        switch (c) {
+        case 'h': /* help */
+            usage(argv[0], 0);
+            break;
+
+        case 'v': /* verbose */
+            args->verbose = 1;
+            break;
+
+        case 'F': /* foreground */
+            args->foreground = 1;
+            break;
+
+        case 'p': /* pid_file */
+            args->pid_file = strdup(optarg);
+            break;
+
+        case 'S': /* unix_socket_path */
+            args->unix_socket_path = strdup(optarg);
+            break;
+
+        case 'm': /* shm_path */
+            args->shm_path = strdup(optarg);
+            break;
+
+        case 'l': /* shm_size */
+            if (parse_size(optarg, &args->shm_size) < 0) {
+                fprintf(stderr, "cannot parse shm size\n");
+                usage(argv[0], 1);
+            }
+            break;
+
+        case 'n': /* n_vectors */
+            if (parse_uint(optarg, &args->n_vectors) < 0) {
+                fprintf(stderr, "cannot parse n_vectors\n");
+                usage(argv[0], 1);
+            }
+            break;
+
+        default:
+            usage(argv[0], 1);
+            break;
+        }
+    }
+
+    if (args->n_vectors > IVSHMEM_SERVER_MAX_VECTORS) {
+        fprintf(stderr, "too many requested vectors (max is %d)\n",
+                IVSHMEM_SERVER_MAX_VECTORS);
+        usage(argv[0], 1);
+    }
+
+    if (args->verbose == 1 && args->foreground == 0) {
+        fprintf(stderr, "cannot use verbose in daemon mode\n");
+        usage(argv[0], 1);
+    }
+}
+
+/* wait for events on listening server unix socket and connected client
+ * sockets */
+int
+poll_events(struct ivshmem_server *server)
+{
+    fd_set fds;
+    int ret, maxfd;
+
+    while (1) {
+
+        FD_ZERO(&fds);
+        maxfd = 0;
+        ivshmem_server_get_fds(server, &fds, &maxfd);
+
+        ret = select(maxfd, &fds, NULL, NULL, NULL);
+
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            }
+
+            fprintf(stderr, "select error: %s\n", strerror(errno));
+            break;
+        }
+        if (ret == 0) {
+            continue;
+        }
+
+        if (ivshmem_server_handle_fds(server, &fds, maxfd) < 0) {
+            fprintf(stderr, "ivshmem_server_handle_fds() failed\n");
+            break;
+        }
+    }
+
+    return ret;
+}
+
+int
+main(int argc, char *argv[])
+{
+    struct ivshmem_server server;
+    struct sigaction sa;
+    struct ivshmem_server_args args = {
+        .verbose = DEFAULT_VERBOSE,
+        .foreground = DEFAULT_FOREGROUND,
+        .pid_file = DEFAULT_PID_FILE,
+        .unix_socket_path = DEFAULT_UNIX_SOCK_PATH,
+        .shm_path = DEFAULT_SHM_PATH,
+        .shm_size = DEFAULT_SHM_SIZE,
+        .n_vectors = DEFAULT_N_VECTORS,
+    };
+
+    /* parse arguments, will exit on error */
+    parse_args(&args, argc, argv);
+
+    /* Ignore SIGPIPE, see this link for more info:
+     * http://www.mail-archive.com/libevent-users@monkey.org/msg01606.html */
+    sa.sa_handler = SIG_IGN;
+    sa.sa_flags = 0;
+    if (sigemptyset(&sa.sa_mask) == -1 ||
+        sigaction(SIGPIPE, &sa, 0) == -1) {
+        perror("failed to ignore SIGPIPE; sigaction");
+        return 1;
+    }
+
+    /* init the ivshms structure */
+    if (ivshmem_server_init(&server, args.unix_socket_path, args.shm_path,
+                            args.shm_size, args.n_vectors, args.verbose) < 0) {
+        fprintf(stderr, "cannot init server\n");
+        return 1;
+    }
+
+    /* start the ivshmem server (open shm & unix socket) */
+    if (ivshmem_server_start(&server) < 0) {
+        fprintf(stderr, "cannot bind\n");
+        return 1;
+    }
+
+    /* daemonize if asked to */
+    if (!args.foreground) {
+        FILE *fp;
+
+        if (daemon(1, 1) < 0) {
+            fprintf(stderr, "cannot daemonize: %s\n", strerror(errno));
+            return 1;
+        }
+
+        /* write pid file */
+        fp = fopen(args.pid_file, "w");
+        if (fp == NULL) {
+            fprintf(stderr, "cannot write pid file: %s\n", strerror(errno));
+            return 1;
+        }
+
+        fprintf(fp, "%d\n", (int) getpid());
+        fclose(fp);
+    }
+
+    poll_events(&server);
+
+    fprintf(stdout, "server disconnected\n");
+    ivshmem_server_close(&server);
+
+    return 0;
+}
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 88ec9bb..c8e912f 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1227,9 +1227,13 @@ is qemu.git/contrib/ivshmem-server.  An example syntax when using the shared
 memory server is:
 
 @example
-qemu-system-i386 -device ivshmem,size=<size in format accepted by -m>[,chardev=<id>]
-                 [,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master]
-qemu-system-i386 -chardev socket,path=<path>,id=<id>
+# First start the ivshmem server once and for all
+ivshmem-server -p <pidfile> -S <path> -m <shm name> -l <shm size> -n <vectors n>
+
+# Then start your qemu instances with matching arguments
+qemu-system-i386 -device ivshmem,size=<shm size>,vectors=<vectors n>,chardev=<id>
+                 [,msi=on][,ioeventfd=on][,role=peer|master]
+                 -chardev socket,path=<path>,id=<id>
 @end example
 
 When using the server, the guest will be assigned a VM ID (>=0) that allows guests
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools
  2014-06-20 12:15 [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools David Marchand
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec David Marchand
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 2/2] contrib: add ivshmem client and server David Marchand
@ 2014-06-23  8:02 ` Claudio Fontana
  2 siblings, 0 replies; 10+ messages in thread
From: Claudio Fontana @ 2014-06-23  8:02 UTC (permalink / raw)
  To: David Marchand, qemu-devel; +Cc: pbonzini, Jani Kokkonen, kvm

Hello David,

On 20.06.2014 14:15, David Marchand wrote:
> Hello, 
> 
> (as suggested by Paolo, ccing Claudio and kvm mailing list)
> 
> Here is a patchset containing an update on ivshmem specs documentation and
> importing ivshmem server and client tools.
> These tools have been written from scratch and are not related to what is
> available in nahanni repository.
> I put them in contrib/ directory as the qemu-doc.texi was already telling the
> server was supposed to be there.

thank you for the initial patches, I am looking carefully at these, also with the help of my colleague Jani Kokkonen over here.
I'll get back to you as soon as we are up to speed.

Ciao,

Claudio

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec David Marchand
@ 2014-06-23 14:18   ` Claudio Fontana
  2014-06-25 15:39     ` David Marchand
  2014-06-26 14:12     ` Cam Macdonell
  2014-06-24 16:09   ` Eric Blake
  1 sibling, 2 replies; 10+ messages in thread
From: Claudio Fontana @ 2014-06-23 14:18 UTC (permalink / raw)
  To: David Marchand, qemu-devel; +Cc: pbonzini, Jani Kokkonen, kvm

Hi,

we were reading through this quickly today, and these are some of the questions that
we think can came up when reading this. Answers to some of these questions we think
we have figured out, but I think it's important to put this information into the
documentation.

I will quote the file in its entirety, and insert some questions inline.

> Device Specification for Inter-VM shared memory device
> ------------------------------------------------------
> 
> The Inter-VM shared memory device is designed to share a region of memory to
> userspace in multiple virtual guests.

What does "to userspace" mean in this context? The userspace of the host, or the userspace in the guest?

What about "The Inter-VM shared memory device is designed to share a memory region (created on the host via the POSIX shared memory API) between multiple QEMU processes running different guests. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR."

Whether in those guests the memory region is used in kernel space or userspace, or there is even any meaning for those terms is guest-dependent I would think (I think of an OSv here, where the application and kernel execute at the same privilege level and in the same address space).

> The memory region does not belong to any
> guest, but is a POSIX memory object on the host.

Ok that's clear.
One thing I would ask is, but I don't know if it makes sense to mention here, is who creates this memory object on the host?
I understand in some cases it's the contributed server (what you provide in contrib/), in some cases it's the "user" of this device who has to write some server code for that, but is it true that also the qemu process itself can create this memory object on its own, without any external process needed? Is this the use case for host<->guest only?

> Optionally, the device may
> support sending interrupts to other guests sharing the same memory region.

This opens a lot of questions here which are partly answered later (If I understand correctly, not only interrupts are involved, but a complete communication protocol involving registers in BAR0), but what about staying a bit general here, like
"Optionally, the device may also provide a communication mechanism between guests sharing the same memory region. More details about that in the section 'OPTIONAL ivshmem guest to guest communication protocol'.

Thinking out loud, I wonder if this communication mechanism should be part of this device in QEMU, or it should be provided at another layer..

> 
> 
> The Inter-VM PCI device
> -----------------------
> 
> *BARs*
> 
> The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
> registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
> used to map the shared memory object from the host.  The size of BAR2 is
> specified when the guest is started and must be a power of 2 in size.

Are BAR0 and BAR1 optional? That's what I would think by reading the whole, but I'm still not sure.
Am I forced to map BAR0 and BAR1 anyway? I don't think so, but..

If so, can we separate the explanation into the base shared memory feature, and a separate section which explains the OPTIONAL communication mechanism, and the OPTIONAL MSI-X BAR?

For example, say that I am a potential ivshmem user (which I am), and I am interested in the shared memory but I want to use my own communication mechanism and protocol between guests, can we make it so that I don't have to wonder whether some of the info I read applies or not?
The solution to that I think is to put all the OPTIONAL parts into separate sections.

> 
> *Registers*

Ok, so this should I think go into one such OPTIONAL sections.

> 
> The device currently supports 4 registers of 32-bits each.  Registers
> are used for synchronization between guests sharing the same memory object when
> interrupts are supported (this requires using the shared memory server).

So use of BAR0 goes together with interrupts, and goes together with the shared memory server (is it the one contributed in contrib/?)

> 
> The server assigns each VM an ID number and sends this ID number to the QEMU
> process when the guest starts.
> 
> enum ivshmem_registers {
>     IntrMask = 0,
>     IntrStatus = 4,
>     IVPosition = 8,
>     Doorbell = 12
> };
> 
> The first two registers are the interrupt mask and status registers.  Mask and
> status are only used with pin-based interrupts.  They are unused with MSI
> interrupts.
> 
> Status Register: The status register is set to 1 when an interrupt occurs.
> 
> Mask Register: The mask register is bitwise ANDed with the interrupt status
> and the result will raise an interrupt if it is non-zero.  However, since 1 is
> the only value the status will be set to, it is only the first bit of the mask
> that has any effect.  Therefore interrupts can be masked by setting the first
> bit to 0 and unmasked by setting the first bit to 1.
> 
> IVPosition Register: The IVPosition register is read-only and reports the
> guest's ID number.  The guest IDs are non-negative integers.  When using the
> server, since the server is a separate process, the VM ID will only be set when
> the device is ready (shared memory is received from the server and accessible via
> the device).  If the device is not ready, the IVPosition will return -1.
> Applications should ensure that they have a valid VM ID before accessing the
> shared memory.

So the guest ID number is 32bits, but actually the doorbell is 16-bit, can we be
more explicit about this? So does it follow that the maximum number of guests
is 65536?

> 
> Doorbell Register:  To interrupt another guest, a guest must write to the
> Doorbell register.  The doorbell register is 32-bits, logically divided into
> two 16-bit fields.  The high 16-bits are the guest ID to interrupt and the low
> 16-bits are the interrupt vector to trigger.  The semantics of the value
> written to the doorbell depends on whether the device is using MSI or a regular
> pin-based interrupt.  In short, MSI uses vectors while regular interrupts set the
> status register.
> 
> Regular Interrupts
> 
> If regular interrupts are used (due to either a guest not supporting MSI or the
> user specifying not to use them on startup) then the value written to the lower
> 16-bits of the Doorbell register results is arbitrary and will trigger an
> interrupt in the destination guest.
> 
> Message Signalled Interrupts
> 
> A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
> written to the Doorbell register must be between 0 and the maximum number of
> vectors the guest supports.  The lower 16 bits written to the doorbell is the
> MSI vector that will be raised in the destination guest.  The number of MSI
> vectors is configurable but it is set when the VM is started.
> 
> The important thing to remember with MSI is that it is only a signal, no status
> is set (since MSI interrupts are not shared).  All information other than the
> interrupt itself should be communicated via the shared memory region.  Devices
> supporting multiple MSI vectors can use different vectors to indicate different
> events have occurred.  The semantics of interrupt vectors are left to the
> user's discretion.
> 
> 

Maybe an example of a full exchange would be useful to explain the use of these registers, making the protocol used for communication clear; or does this only provide mechanisms that can be used by someone else to implement a protocol?

> IVSHMEM host services
> ---------------------
> 
> This part is optional (see *Usage in the Guest* section below)

Ok this section is optional, but its role is not that clear to me.

So are there exactly 3 ways this can be used:

1) shared memory only, PCI BAR2
2) full device including registers in BAR0 but no MSI
3) full device including registers in BAR0 and MSI support in BAR1
?

> 
> To handle notifications between users of a ivshmem device, a ivshmem server has
> been added. This server is responsible for creating the shared memory and
> creating a set of eventfds for each users of the shared memory.

Ok this is the first time eventfds are mentioned, after we spoke about interrupts in the other section before..

> It behaves as a
> proxy between the different ivshmem clients (QEMU): giving the shared memory fd
> to each client,

telling each client which /dev/name to shm_open?

> allocating eventfds to new clients and broadcasting to all
> clients when a client disappears.

What about VM Ids, are they also decided and shared by the server?

> 
> Apart from the current ivshmem implementation in QEMU, a ivshmem client can be
> written for debug, for development purposes, or to implement notifications
> between host and guests.
> 
> 
> Usage in the Guest
> ------------------
> 
> The guest should map BAR0 to access the registers (an array of 32-bit ints
> allows simple writing) and map BAR2 to access the shared memory region itself.

Ok, but can I avoid mapping BAR0 if I don't use the registers?

> The size of the shared memory region is specified when the guest (or shared
> memory server) is started. A guest may map the whole shared memory region or
> only part of it.

So what does it mean here, I can choose to start the optional server contributed in contrib/
with a shared memory size parameter determining the size of the actual shared memory region,
and then the guest has the option to map only part of that?

Or can also the guest (or better, the QEMU process running the guest) create the shared memory region by itself?
Which parameters control these behaviours?

Btw I would expect there to be a separate section with all the QEMU command line configuration parameters and their effect on behavior of this device. Also for the contributed code in contrib/, especially for the server, we need documentation about the command line parameters, env variables, whatever can be configured and which effect they have on this.

> 
> ivshmem provides an optional notification mechanism through eventfds handled by
> QEMU that will trigger interrupts in guests. This mechanism is enabled when
> using a ivshmem-server which must be started prior to VMs and which serves as a
> proxy for exchanging eventfds.

Here also, a simple description of such a sequence of exchanges would be welcome, I would not mind some ASCII art as well.

> 
> It is your choice how to use the ivshmem device.

Good :)

> - the simple way, you don't need anything else than what is already in QEMU.

If the server becomes part of the QEMU package, then this sentence is a bit unclear right? This was probably written at the time the server was not contributed to QEMU, right?

>   You can map the shared memory in guest, then use it in userland as you see fit

In userland.. ? Can I create the shared memory by just running a qemu process with some parameters? Does this mean I now share memory between guest and host? If I run multiple guest providing the same device name, can I make them use the same shared memory without the need of any server?

>   (memnic for example works this way http://dpdk.org/browse/memnic),

I'll check that out..

> - the more advanced way, basically, if you want an event mechanism between the
>   VMs using your ivshmem device. In this case, then you will most likely want to
>   write a kernel driver that will handle interrupts.

Ok.

Let me ask you this, what about virtio?
Can I take this shared memory implementation, and then run virtio on top of that, which already has primitives for communication?

I understand this would restricts me to 1 vs 1 communication, while with the optional server in contrib/ I would have any to any communication available.

But what about the 1 to 1 guest-to-guest communication, is in this case in theory possible to put virtio on top of ivshmem and use that to make the two guests communicate?

This is just a list of questions that we came up with, but anybody please weigh in with your additional questions, comments, feedback. Especially I would like to know if the idea to have a virtio guest to guest communication is possible and realistic, maybe with minimal extension of virtio, or if I am being insane.

Thank you,

Claudio

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-20 12:15 ` [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec David Marchand
  2014-06-23 14:18   ` Claudio Fontana
@ 2014-06-24 16:09   ` Eric Blake
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Blake @ 2014-06-24 16:09 UTC (permalink / raw)
  To: David Marchand, qemu-devel; +Cc: pbonzini, claudio.fontana, kvm

[-- Attachment #1: Type: text/plain, Size: 1721 bytes --]

On 06/20/2014 06:15 AM, David Marchand wrote:
> Add some notes on the parts needed to use ivshmem devices: more specifically,
> explain the purpose of an ivshmem server and the basic concept to use the
> ivshmem devices in guests.
> 
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> ---
>  docs/specs/ivshmem_device_spec.txt |   41 ++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt
> index 667a862..7d2b73f 100644
> --- a/docs/specs/ivshmem_device_spec.txt
> +++ b/docs/specs/ivshmem_device_spec.txt
> @@ -85,12 +85,41 @@ events have occurred.  The semantics of interrupt vectors are left to the
>  user's discretion.
>  
>  
> +IVSHMEM host services
> +---------------------
> +
> +This part is optional (see *Usage in the Guest* section below).
> +
> +To handle notifications between users of a ivshmem device, a ivshmem server has

s/a ivshmem/an ivshmem/ (twice)

> +been added. This server is responsible for creating the shared memory and
> +creating a set of eventfds for each users of the shared memory. It behaves as a
> +proxy between the different ivshmem clients (QEMU): giving the shared memory fd
> +to each client, allocating eventfds to new clients and broadcasting to all
> +clients when a client disappears.
> +
> +Apart from the current ivshmem implementation in QEMU, a ivshmem client can be

and again

> +written for debug, for development purposes, or to implement notifications
> +between host and guests.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-23 14:18   ` Claudio Fontana
@ 2014-06-25 15:39     ` David Marchand
  2014-06-26 14:12     ` Cam Macdonell
  1 sibling, 0 replies; 10+ messages in thread
From: David Marchand @ 2014-06-25 15:39 UTC (permalink / raw)
  To: Claudio Fontana, qemu-devel; +Cc: pbonzini, Jani Kokkonen, kvm

Hello Claudio,

Sorry for the delay.
I am a bit short on time and will be offline for a week starting tonight.

I agree there are points that must be more clearly described (and I 
agree that ivshmem code will most likely have to be cleaned up after this).
Restructuring the documentation with a "optional" section is a good idea 
too.

I will work on this at my return.

Anyway, thanks for the review.


-- 
David Marchand


On 06/23/2014 04:18 PM, Claudio Fontana wrote:
> Hi,
>
> we were reading through this quickly today, and these are some of the questions that
> we think can came up when reading this. Answers to some of these questions we think
> we have figured out, but I think it's important to put this information into the
> documentation.
>
> I will quote the file in its entirety, and insert some questions inline.
>
>> Device Specification for Inter-VM shared memory device
>> ------------------------------------------------------
>>
>> The Inter-VM shared memory device is designed to share a region of memory to
>> userspace in multiple virtual guests.
>
> What does "to userspace" mean in this context? The userspace of the host, or the userspace in the guest?
>
> What about "The Inter-VM shared memory device is designed to share a memory region (created on the host via the POSIX shared memory API) between multiple QEMU processes running different guests. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR."
>
> Whether in those guests the memory region is used in kernel space or userspace, or there is even any meaning for those terms is guest-dependent I would think (I think of an OSv here, where the application and kernel execute at the same privilege level and in the same address space).
>
>> The memory region does not belong to any
>> guest, but is a POSIX memory object on the host.
>
> Ok that's clear.
> One thing I would ask is, but I don't know if it makes sense to mention here, is who creates this memory object on the host?
> I understand in some cases it's the contributed server (what you provide in contrib/), in some cases it's the "user" of this device who has to write some server code for that, but is it true that also the qemu process itself can create this memory object on its own, without any external process needed? Is this the use case for host<->guest only?
>
>> Optionally, the device may
>> support sending interrupts to other guests sharing the same memory region.
>
> This opens a lot of questions here which are partly answered later (If I understand correctly, not only interrupts are involved, but a complete communication protocol involving registers in BAR0), but what about staying a bit general here, like
> "Optionally, the device may also provide a communication mechanism between guests sharing the same memory region. More details about that in the section 'OPTIONAL ivshmem guest to guest communication protocol'.
>
> Thinking out loud, I wonder if this communication mechanism should be part of this device in QEMU, or it should be provided at another layer..
>
>
>>
>>
>> The Inter-VM PCI device
>> -----------------------
>>
>> *BARs*
>>
>> The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
>> registers.  BAR1 is used for MSI-X when it is enabled in the device.  BAR2 is
>> used to map the shared memory object from the host.  The size of BAR2 is
>> specified when the guest is started and must be a power of 2 in size.
>
> Are BAR0 and BAR1 optional? That's what I would think by reading the whole, but I'm still not sure.
> Am I forced to map BAR0 and BAR1 anyway? I don't think so, but..
>
> If so, can we separate the explanation into the base shared memory feature, and a separate section which explains the OPTIONAL communication mechanism, and the OPTIONAL MSI-X BAR?
>
> For example, say that I am a potential ivshmem user (which I am), and I am interested in the shared memory but I want to use my own communication mechanism and protocol between guests, can we make it so that I don't have to wonder whether some of the info I read applies or not?
> The solution to that I think is to put all the OPTIONAL parts into separate sections.
>
>>
>> *Registers*
>
> Ok, so this should I think go into one such OPTIONAL sections.
>
>>
>> The device currently supports 4 registers of 32-bits each.  Registers
>> are used for synchronization between guests sharing the same memory object when
>> interrupts are supported (this requires using the shared memory server).
>
> So use of BAR0 goes together with interrupts, and goes together with the shared memory server (is it the one contributed in contrib/?)
>
>>
>> The server assigns each VM an ID number and sends this ID number to the QEMU
>> process when the guest starts.
>>
>> enum ivshmem_registers {
>>      IntrMask = 0,
>>      IntrStatus = 4,
>>      IVPosition = 8,
>>      Doorbell = 12
>> };
>>
>> The first two registers are the interrupt mask and status registers.  Mask and
>> status are only used with pin-based interrupts.  They are unused with MSI
>> interrupts.
>>
>> Status Register: The status register is set to 1 when an interrupt occurs.
>>
>> Mask Register: The mask register is bitwise ANDed with the interrupt status
>> and the result will raise an interrupt if it is non-zero.  However, since 1 is
>> the only value the status will be set to, it is only the first bit of the mask
>> that has any effect.  Therefore interrupts can be masked by setting the first
>> bit to 0 and unmasked by setting the first bit to 1.
>>
>> IVPosition Register: The IVPosition register is read-only and reports the
>> guest's ID number.  The guest IDs are non-negative integers.  When using the
>> server, since the server is a separate process, the VM ID will only be set when
>> the device is ready (shared memory is received from the server and accessible via
>> the device).  If the device is not ready, the IVPosition will return -1.
>> Applications should ensure that they have a valid VM ID before accessing the
>> shared memory.
>
> So the guest ID number is 32bits, but actually the doorbell is 16-bit, can we be
> more explicit about this? So does it follow that the maximum number of guests
> is 65536?
>
>>
>> Doorbell Register:  To interrupt another guest, a guest must write to the
>> Doorbell register.  The doorbell register is 32-bits, logically divided into
>> two 16-bit fields.  The high 16-bits are the guest ID to interrupt and the low
>> 16-bits are the interrupt vector to trigger.  The semantics of the value
>> written to the doorbell depends on whether the device is using MSI or a regular
>> pin-based interrupt.  In short, MSI uses vectors while regular interrupts set the
>> status register.
>>
>> Regular Interrupts
>>
>> If regular interrupts are used (due to either a guest not supporting MSI or the
>> user specifying not to use them on startup) then the value written to the lower
>> 16-bits of the Doorbell register results is arbitrary and will trigger an
>> interrupt in the destination guest.
>>
>> Message Signalled Interrupts
>>
>> A ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
>> written to the Doorbell register must be between 0 and the maximum number of
>> vectors the guest supports.  The lower 16 bits written to the doorbell is the
>> MSI vector that will be raised in the destination guest.  The number of MSI
>> vectors is configurable but it is set when the VM is started.
>>
>> The important thing to remember with MSI is that it is only a signal, no status
>> is set (since MSI interrupts are not shared).  All information other than the
>> interrupt itself should be communicated via the shared memory region.  Devices
>> supporting multiple MSI vectors can use different vectors to indicate different
>> events have occurred.  The semantics of interrupt vectors are left to the
>> user's discretion.
>>
>>
>
> Maybe an example of a full exchange would be useful to explain the use of these registers, making the protocol used for communication clear; or does this only provide mechanisms that can be used by someone else to implement a protocol?
>
>
>
>> IVSHMEM host services
>> ---------------------
>>
>> This part is optional (see *Usage in the Guest* section below)
>
> Ok this section is optional, but its role is not that clear to me.
>
> So are there exactly 3 ways this can be used:
>
> 1) shared memory only, PCI BAR2
> 2) full device including registers in BAR0 but no MSI
> 3) full device including registers in BAR0 and MSI support in BAR1
> ?
>
>
>
>>
>> To handle notifications between users of a ivshmem device, a ivshmem server has
>> been added. This server is responsible for creating the shared memory and
>> creating a set of eventfds for each users of the shared memory.
>
> Ok this is the first time eventfds are mentioned, after we spoke about interrupts in the other section before..
>
>> It behaves as a
>> proxy between the different ivshmem clients (QEMU): giving the shared memory fd
>> to each client,
>
> telling each client which /dev/name to shm_open?
>
>> allocating eventfds to new clients and broadcasting to all
>> clients when a client disappears.
>
> What about VM Ids, are they also decided and shared by the server?
>
>>
>> Apart from the current ivshmem implementation in QEMU, a ivshmem client can be
>> written for debug, for development purposes, or to implement notifications
>> between host and guests.
>>
>>
>> Usage in the Guest
>> ------------------
>>
>> The guest should map BAR0 to access the registers (an array of 32-bit ints
>> allows simple writing) and map BAR2 to access the shared memory region itself.
>
> Ok, but can I avoid mapping BAR0 if I don't use the registers?
>
>> The size of the shared memory region is specified when the guest (or shared
>> memory server) is started. A guest may map the whole shared memory region or
>> only part of it.
>
> So what does it mean here, I can choose to start the optional server contributed in contrib/
> with a shared memory size parameter determining the size of the actual shared memory region,
> and then the guest has the option to map only part of that?
>
> Or can also the guest (or better, the QEMU process running the guest) create the shared memory region by itself?
> Which parameters control these behaviours?
>
> Btw I would expect there to be a separate section with all the QEMU command line configuration parameters and their effect on behavior of this device. Also for the contributed code in contrib/, especially for the server, we need documentation about the command line parameters, env variables, whatever can be configured and which effect they have on this.
>
>>
>> ivshmem provides an optional notification mechanism through eventfds handled by
>> QEMU that will trigger interrupts in guests. This mechanism is enabled when
>> using a ivshmem-server which must be started prior to VMs and which serves as a
>> proxy for exchanging eventfds.
>
> Here also, a simple description of such a sequence of exchanges would be welcome, I would not mind some ASCII art as well.
>
>>
>> It is your choice how to use the ivshmem device.
>
> Good :)
>
>> - the simple way, you don't need anything else than what is already in QEMU.
>
> If the server becomes part of the QEMU package, then this sentence is a bit unclear right? This was probably written at the time the server was not contributed to QEMU, right?
>
>>    You can map the shared memory in guest, then use it in userland as you see fit
>
> In userland.. ? Can I create the shared memory by just running a qemu process with some parameters? Does this mean I now share memory between guest and host? If I run multiple guest providing the same device name, can I make them use the same shared memory without the need of any server?
>
>>    (memnic for example works this way http://dpdk.org/browse/memnic),
>
> I'll check that out..
>
>> - the more advanced way, basically, if you want an event mechanism between the
>>    VMs using your ivshmem device. In this case, then you will most likely want to
>>    write a kernel driver that will handle interrupts.
>
> Ok.
>
> Let me ask you this, what about virtio?
> Can I take this shared memory implementation, and then run virtio on top of that, which already has primitives for communication?
>
> I understand this would restricts me to 1 vs 1 communication, while with the optional server in contrib/ I would have any to any communication available.
>
> But what about the 1 to 1 guest-to-guest communication, is in this case in theory possible to put virtio on top of ivshmem and use that to make the two guests communicate?
>
> This is just a list of questions that we came up with, but anybody please weigh in with your additional questions, comments, feedback. Especially I would like to know if the idea to have a virtio guest to guest communication is possible and realistic, maybe with minimal extension of virtio, or if I am being insane.
>
> Thank you,
>
> Claudio
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-23 14:18   ` Claudio Fontana
  2014-06-25 15:39     ` David Marchand
@ 2014-06-26 14:12     ` Cam Macdonell
  2014-06-26 15:37       ` Vincent JARDIN
  1 sibling, 1 reply; 10+ messages in thread
From: Cam Macdonell @ 2014-06-26 14:12 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Paolo Bonzini, Jani Kokkonen, David Marchand, KVM General,
	qemu-devel@nongnu.org Developers

[-- Attachment #1: Type: text/plain, Size: 15030 bytes --]

Hi,

Thank you for everyone's interest and work on this.  Sorry I haven't
been...better.  I will offer my knowledge where it helps.  And the server
is GPL in case that was seen as an issue.

On Mon, Jun 23, 2014 at 8:18 AM, Claudio Fontana <claudio.fontana@huawei.com
> wrote:

> Hi,
>
> we were reading through this quickly today, and these are some of the
> questions that
> we think can came up when reading this. Answers to some of these questions
> we think
> we have figured out, but I think it's important to put this information
> into the
> documentation.
>
> I will quote the file in its entirety, and insert some questions inline.
>
> > Device Specification for Inter-VM shared memory device
> > ------------------------------------------------------
> >
> > The Inter-VM shared memory device is designed to share a region of
> memory to
> > userspace in multiple virtual guests.
>
> What does "to userspace" mean in this context? The userspace of the host,
> or the userspace in the guest?
>

The memory is intended to be shared between userspaces in the guests.
 However, since the memory is POSIX shm region, it is visible on the host
too.


>
> What about "The Inter-VM shared memory device is designed to share a
> memory region (created on the host via the POSIX shared memory API) between
> multiple QEMU processes running different guests. In order for all guests
> to be able to pick up the shared memory area, it is modeled by QEMU as a
> PCI device exposing said memory to the guest as a PCI BAR."
>
> Whether in those guests the memory region is used in kernel space or
> userspace, or there is even any meaning for those terms is guest-dependent
> I would think (I think of an OSv here, where the application and kernel
> execute at the same privilege level and in the same address space).


I'm not exactly clear what you're asking here.  The region is visible to
both the guest kernel and userspace (once mounted).


>
> > The memory region does not belong to any
> > guest, but is a POSIX memory object on the host.
>
> Ok that's clear.
> One thing I would ask is, but I don't know if it makes sense to mention
> here, is who creates this memory object on the host?

I understand in some cases it's the contributed server (what you provide in
> contrib/), in some cases it's the "user" of this device who has to write
> some server code for that, but is it true that also the qemu process itself
> can create this memory object on its own, without any external process
> needed? Is this the use case for host<->guest only?
>
>
(Answering based on my original server code) When using the server, the
server creates it.  Without the server, each qemu process will check if it
exists and if it does, it will use it.  If it does not exist, the qemu
process will create it.


> > Optionally, the device may
> > support sending interrupts to other guests sharing the same memory
> region.
>
> This opens a lot of questions here which are partly answered later (If I
> understand correctly, not only interrupts are involved, but a complete
> communication protocol involving registers in BAR0), but what about staying
> a bit general here, like
> "Optionally, the device may also provide a communication mechanism between
> guests sharing the same memory region. More details about that in the
> section 'OPTIONAL ivshmem guest to guest communication protocol'.
>
> Thinking out loud, I wonder if this communication mechanism should be part
> of this device in QEMU, or it should be provided at another layer..
>
>
> >
> >
> > The Inter-VM PCI device
> > -----------------------
> >
> > *BARs*
> >
> > The device supports three BARs.  BAR0 is a 1 Kbyte MMIO region to support
> > registers.  BAR1 is used for MSI-X when it is enabled in the device.
>  BAR2 is
> > used to map the shared memory object from the host.  The size of BAR2 is
> > specified when the guest is started and must be a power of 2 in size.
>
> Are BAR0 and BAR1 optional? That's what I would think by reading the
> whole, but I'm still not sure.
> Am I forced to map BAR0 and BAR1 anyway? I don't think so, but..
>
>
They do not need to be mapped.  You do not need to map them if you don't
want to use them.


> If so, can we separate the explanation into the base shared memory
> feature, and a separate section which explains the OPTIONAL communication
> mechanism, and the OPTIONAL MSI-X BAR?
>
> For example, say that I am a potential ivshmem user (which I am), and I am
> interested in the shared memory but I want to use my own communication
> mechanism and protocol between guests, can we make it so that I don't have
> to wonder whether some of the info I read applies or not?
> The solution to that I think is to put all the OPTIONAL parts into
> separate sections.
>
> >
> > *Registers*
>
> Ok, so this should I think go into one such OPTIONAL sections.
>
> >
> > The device currently supports 4 registers of 32-bits each.  Registers
> > are used for synchronization between guests sharing the same memory
> object when
> > interrupts are supported (this requires using the shared memory server).
>
> So use of BAR0 goes together with interrupts, and goes together with the
> shared memory server (is it the one contributed in contrib/?)
>
> >
> > The server assigns each VM an ID number and sends this ID number to the
> QEMU
> > process when the guest starts.
> >
> > enum ivshmem_registers {
> >     IntrMask = 0,
> >     IntrStatus = 4,
> >     IVPosition = 8,
> >     Doorbell = 12
> > };
> >
> > The first two registers are the interrupt mask and status registers.
>  Mask and
> > status are only used with pin-based interrupts.  They are unused with MSI
> > interrupts.
> >
> > Status Register: The status register is set to 1 when an interrupt
> occurs.
> >
> > Mask Register: The mask register is bitwise ANDed with the interrupt
> status
> > and the result will raise an interrupt if it is non-zero.  However,
> since 1 is
> > the only value the status will be set to, it is only the first bit of
> the mask
> > that has any effect.  Therefore interrupts can be masked by setting the
> first
> > bit to 0 and unmasked by setting the first bit to 1.
> >
> > IVPosition Register: The IVPosition register is read-only and reports the
> > guest's ID number.  The guest IDs are non-negative integers.  When using
> the
> > server, since the server is a separate process, the VM ID will only be
> set when
> > the device is ready (shared memory is received from the server and
> accessible via
> > the device).  If the device is not ready, the IVPosition will return -1.
> > Applications should ensure that they have a valid VM ID before accessing
> the
> > shared memory.
>
> So the guest ID number is 32bits, but actually the doorbell is 16-bit, can
> we be
> more explicit about this? So does it follow that the maximum number of
> guests
> is 65536?
>

Yes, for each server and its corresponding memory region.


>
> >
> > Doorbell Register:  To interrupt another guest, a guest must write to the
> > Doorbell register.  The doorbell register is 32-bits, logically divided
> into
> > two 16-bit fields.  The high 16-bits are the guest ID to interrupt and
> the low
> > 16-bits are the interrupt vector to trigger.  The semantics of the value
> > written to the doorbell depends on whether the device is using MSI or a
> regular
> > pin-based interrupt.  In short, MSI uses vectors while regular
> interrupts set the
> > status register.
> >
> > Regular Interrupts
> >
> > If regular interrupts are used (due to either a guest not supporting MSI
> or the
> > user specifying not to use them on startup) then the value written to
> the lower
> > 16-bits of the Doorbell register results is arbitrary and will trigger an
> > interrupt in the destination guest.
> >
> > Message Signalled Interrupts
> >
> > A ivshmem device may support multiple MSI vectors.  If so, the lower
> 16-bits
> > written to the Doorbell register must be between 0 and the maximum
> number of
> > vectors the guest supports.  The lower 16 bits written to the doorbell
> is the
> > MSI vector that will be raised in the destination guest.  The number of
> MSI
> > vectors is configurable but it is set when the VM is started.
> >
> > The important thing to remember with MSI is that it is only a signal, no
> status
> > is set (since MSI interrupts are not shared).  All information other
> than the
> > interrupt itself should be communicated via the shared memory region.
>  Devices
> > supporting multiple MSI vectors can use different vectors to indicate
> different
> > events have occurred.  The semantics of interrupt vectors are left to the
> > user's discretion.
> >
> >
>
> Maybe an example of a full exchange would be useful to explain the use of
> these registers, making the protocol used for communication clear; or does
> this only provide mechanisms that can be used by someone else to implement
> a protocol?
>
>
>
> > IVSHMEM host services
> > ---------------------
> >
> > This part is optional (see *Usage in the Guest* section below)
>
> Ok this section is optional, but its role is not that clear to me.
>
> So are there exactly 3 ways this can be used:
>
> 1) shared memory only, PCI BAR2
> 2) full device including registers in BAR0 but no MSI
> 3) full device including registers in BAR0 and MSI support in BAR1
> ?
>
>
>
> >
> > To handle notifications between users of a ivshmem device, a ivshmem
> server has
> > been added. This server is responsible for creating the shared memory and
> > creating a set of eventfds for each users of the shared memory.
>
> Ok this is the first time eventfds are mentioned, after we spoke about
> interrupts in the other section before..
>

The interrupts are transported between QEMU processes using eventfds.  The
interrupts are delivered into the guest using regular interrupts or MSI-X.
 The interrupts can be delivered to user-level using eventfds in UIO.


>
> > It behaves as a
> > proxy between the different ivshmem clients (QEMU): giving the shared
> memory fd
> > to each client,
>
> telling each client which /dev/name to shm_open?


No, it passes a file descriptor to the region using SCM_RIGHTS.  When using
the server, the qemu clients do not know the name of the shm region.


>
> > allocating eventfds to new clients and broadcasting to all
> > clients when a client disappears.
>
> What about VM Ids, are they also decided and shared by the server?
>

Yes, the server hands out increasing VM Ids.


>
> >
> > Apart from the current ivshmem implementation in QEMU, a ivshmem client
> can be
> > written for debug, for development purposes, or to implement
> notifications
> > between host and guests.
> >
> >
> > Usage in the Guest
> > ------------------
> >
> > The guest should map BAR0 to access the registers (an array of 32-bit
> ints
> > allows simple writing) and map BAR2 to access the shared memory region
> itself.
>
> Ok, but can I avoid mapping BAR0 if I don't use the registers?
>

Yes


>
> > The size of the shared memory region is specified when the guest (or
> shared
> > memory server) is started. A guest may map the whole shared memory
> region or
> > only part of it.
>
> So what does it mean here, I can choose to start the optional server
> contributed in contrib/
> with a shared memory size parameter determining the size of the actual
> shared memory region,
> and then the guest has the option to map only part of that?
>

You do not need to map the whole region.


>
> Or can also the guest (or better, the QEMU process running the guest)
> create the shared memory region by itself?
> Which parameters control these behaviours?
>

When giving a shared memory region name "foo"

    -device ivshmem,shm=foo,size=2048,use64=1


1) if the 'foo' memory object doesn't exist, the qemu process will create it
2) if 'foo' already exists it will use it
3) if the object exists but does not match the size specified, ivshmem will
exit.


> Btw I would expect there to be a separate section with all the QEMU
> command line configuration parameters and their effect on behavior of this
> device. Also for the contributed code in contrib/, especially for the
> server, we need documentation about the command line parameters, env
> variables, whatever can be configured and which effect they have on this.
>
> >
> > ivshmem provides an optional notification mechanism through eventfds
> handled by
> > QEMU that will trigger interrupts in guests. This mechanism is enabled
> when
> > using a ivshmem-server which must be started prior to VMs and which
> serves as a
> > proxy for exchanging eventfds.
>
> Here also, a simple description of such a sequence of exchanges would be
> welcome, I would not mind some ASCII art as well.
>
> >
> > It is your choice how to use the ivshmem device.
>
> Good :)
>
> > - the simple way, you don't need anything else than what is already in
> QEMU.
>
> If the server becomes part of the QEMU package, then this sentence is a
> bit unclear right? This was probably written at the time the server was not
> contributed to QEMU, right?
>
> >   You can map the shared memory in guest, then use it in userland as you
> see fit
>
> In userland.. ? Can I create the shared memory by just running a qemu
> process with some parameters? Does this mean I now share memory between
> guest and host? If I run multiple guest providing the same device name, can
> I make them use the same shared memory without the need of any server?
>

Yes, the server is only necessary for the interrupt behaviour.


>
> >   (memnic for example works this way http://dpdk.org/browse/memnic),
>
> I'll check that out..
>
> > - the more advanced way, basically, if you want an event mechanism
> between the
> >   VMs using your ivshmem device. In this case, then you will most likely
> want to
> >   write a kernel driver that will handle interrupts.
>
> Ok.
>
> Let me ask you this, what about virtio?
> Can I take this shared memory implementation, and then run virtio on top
> of that, which already has primitives for communication?
>
> I understand this would restricts me to 1 vs 1 communication, while with
> the optional server in contrib/ I would have any to any communication
> available.
>
> But what about the 1 to 1 guest-to-guest communication, is in this case in
> theory possible to put virtio on top of ivshmem and use that to make the
> two guests communicate?
>
> This is just a list of questions that we came up with, but anybody please
> weigh in with your additional questions, comments, feedback. Especially I
> would like to know if the idea to have a virtio guest to guest
> communication is possible and realistic, maybe with minimal extension of
> virtio, or if I am being insane.
>
>
There was originally a virtio-based version of ivshmem.  You could see the
discussion around that sometime in 2009.  I think you could use virtio over
ivshmem but the 1-to-1 case is quite limiting.  Virtio is well optimized
for what it does and so it was decided to keep the two separate.

HTH,

Cam


Thank you,
>
> Claudio
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 20268 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-26 14:12     ` Cam Macdonell
@ 2014-06-26 15:37       ` Vincent JARDIN
  2014-06-26 16:13         ` Cam Macdonell
  0 siblings, 1 reply; 10+ messages in thread
From: Vincent JARDIN @ 2014-06-26 15:37 UTC (permalink / raw)
  To: Cam Macdonell
  Cc: KVM General, Claudio Fontana, David Marchand,
	qemu-devel@nongnu.org Developers, Paolo Bonzini, Jani Kokkonen

Hi Cam,

FYI, David did implement a new server.
   http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg04978.html

which is easier to maintain.

Please, could you review his patch? He'll be back from holiday within 1 
week.

Best regards,
   Vincent

PS: thanks for your comments

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
  2014-06-26 15:37       ` Vincent JARDIN
@ 2014-06-26 16:13         ` Cam Macdonell
  0 siblings, 0 replies; 10+ messages in thread
From: Cam Macdonell @ 2014-06-26 16:13 UTC (permalink / raw)
  To: Vincent JARDIN
  Cc: KVM General, Claudio Fontana, David Marchand,
	qemu-devel@nongnu.org Developers, Paolo Bonzini, Jani Kokkonen

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

Hi Vince,

Yes, I did see the patches for the new server.  I will review within the
week.

Cheers,
Cam


On Thu, Jun 26, 2014 at 9:37 AM, Vincent JARDIN <vincent.jardin@6wind.com>
wrote:

> Hi Cam,
>
> FYI, David did implement a new server.
>   http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg04978.html
>
> which is easier to maintain.
>
> Please, could you review his patch? He'll be back from holiday within 1
> week.
>
> Best regards,
>   Vincent
>
> PS: thanks for your comments
>
>

[-- Attachment #2: Type: text/html, Size: 1028 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-06-26 16:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-20 12:15 [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools David Marchand
2014-06-20 12:15 ` [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec David Marchand
2014-06-23 14:18   ` Claudio Fontana
2014-06-25 15:39     ` David Marchand
2014-06-26 14:12     ` Cam Macdonell
2014-06-26 15:37       ` Vincent JARDIN
2014-06-26 16:13         ` Cam Macdonell
2014-06-24 16:09   ` Eric Blake
2014-06-20 12:15 ` [Qemu-devel] [PATCH 2/2] contrib: add ivshmem client and server David Marchand
2014-06-23  8:02 ` [Qemu-devel] [PATCH 0/2] ivshmem: update documentation, add client/server tools Claudio Fontana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).