[Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
@ 2015-09-29 13:57 Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket Christian Pinto
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jani.Kokkonen, tech, Claudio.Fontana, Christian Pinto

Hi all,

This RFC patch-series introduces the set of changes enabling the 
architectural elements to model the architecture presented in a previous RFC
letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".

To recap the goal of such RFC:

The idea is to enhance the current architecture of QEMU to enable the modeling
of a state of-the-art SoC with an AMP processing style, where different
processing units share the same system memory and communicate through shared
memory and inter-processor interrupts. An example is a multi-core ARM CPU
working alongside with two Cortex-M micro controllers.

>From the user point of view there is usually an operating system booting on
the Master processor (e.g. Linux) at platform startup, while the other
processors are used to offload the Master one from some computation or to deal
with real-time interfaces. It is the Master OS that triggers the boot of the
Slave processors, and provides them also the binary code to execute (e.g.
RTOS, binary firmware) by placing it into a pre-defined memory area that is
accessible to the Slaves. Usually the memory for the Slaves is carved out from
the Master OS during boot. Once a Slave is booted the two processors can
communicate through queues in shared memory and inter-processor interrupts
(IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
control (boot/shutdown) of Slave processors, and also to establish a
communication channel based on virtio queues.

Currently, QEMU is not able to model such an architecture mainly because only
a single processor can be emulated at one time, and the OS binary image needs
to be placed in memory at model startup.

This patch series adds a set of modules and introduces minimal changes to the
current QEMU code-base to implement what described above, with master and slave
implemented as two different instances of QEMU. The aim of this work is to 
enable application and runtime programmers to test their AMP applications, or
their new inter-SoC communtication protocol.

The main changes are depicted in the following diagram and involve:
    - A new multi-client socket implementation that allows multiple instances of
      QEMU to attach to the same socket, with only one acting as a master.
    - A new memory backend, the shared memory backend, based on
      the file memory backend. Such new backend enables, on the master side,
      to allocate the whole memory as shareable (e.g. /dev/shm, or hugetlbfs).
      On the slave side it enables the startup of QEMU without any main memory
      allocated. The the slave goes in a waiting state, the same used in the 
      case of an incoming migration, and a callback is registered on a
      multi-client socket shared with the master. 
      The waiting state ends when the master sends to the slave the file
      descriptor and offset to mmap and use as memory.
    - A new inter-processor interrupt hardware distribution module, that is used
      also to trigger the boot of slave processors. Such module uses a pair of
      eventfd for each master-slave couple to trigger interrupts between the
      instances. No slave-to-slave interrupts are envisioned by the current
      implementation. The multi client-socket is used for the master to trigger
      the boot of a slave, and also for each master-slave couple to exchancge the
      eventd file descriptors. The IDM device can be instantiated either as a
      PCI or sysbus device. 


                           Memory                                          
                           (e.g. hugetlbfs)                                 
                                                                           
+------------------+       +--------------+            +------------------+
|                  |       |              |            |                  |
|   QEMU MASTER    |       |   Master     |            |   QEMU SLAVE     |
|                  |       |   Memory     |            |                  |
| +------+  +------+-+     |              |          +-+------+  +------+ |
| |      |  |SHMEM   |     |              |          |SHMEM   |  |      | |
| | VCPU |  |Backend +----->              |    +----->Backend |  | VCPU | |
| |      |  |        |     |              |    | +--->        |  |      | |
| +--^---+  +------+-+     |              |    | |   +-+------+  +--^---+ |
|    |             |       |              |    | |     |            |     |
|    +--+          |       |              |    | |     |        +---+     |
|       | IRQ      |       | +----------+ |    | |     |    IRQ |         |
|       |          |       | |          | |    | |     |        |         |
|  +----+----+     |       | | Slave    <------+ |     |   +----+---+     |
+--+  IDM    +-----+       | | Memory   | |      |     +---+ IDM    +-----+
   +-^----^--+             | |          | |      |         +-^---^--+      
     |    |                | +----------+ |      |           |   |         
     |    |                +--------------+      |           |   |         
     |    |                                      |           |   |         
     |    +--------------------------------------+-----------+   |         
     |   UNIX Domain Socket(send mem fd + offset, trigger boot)  |         
     |                                                           |         
     +-----------------------------------------------------------+         
                              eventfd


The whole code can be checked out from:
https://git.virtualopensystems.com/dev/qemu-het.git
branch:
qemu-het-rfc-v1

Patches apply to the current QEMU master branch

=========
Demo
=========

This patch series comes in the form of a demo to better understand how the
changes introduced can be exploited. 
At the current status the demo can be executed using an ARM target for both
master and slave. 

The demo shows how a master QEMU instance carves out the memory for a slave,
copies inside linux kernel image and device tree blob and finally triggers the
boot.


How to reproduce the demo:

In order to reproduce the demo a couple more extra elements need to be
downloaded and compiled.

Binary loader
Loads the slave firmware (kernel) binary into memory and triggers the boot
https://git.virtualopensystems.com/dev/qemu-het-tools.git
branch:
load-bin-boot
To compile: just type "make"

Slave kernel 
Compile a linux kernel image (zImage) for the virt machine model. 

IDM test kernel module
Needed to trigger the boot of a slave
https://git.virtualopensystems.com/dev/qemu-het-tools.git
branch:
IDM-kernel-module
To compile: KDIR=kernel_path ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make

Slave DTB
https://git.virtualopensystems.com/dev/qemu-het-tools.git
branch:
slave-dtb

Copy binary loader, IDM kernel module, zImage and dtb inside the disk
image or ramdisk of the master instance.

Run the demo:

run the master instance

./arm-softmmu/qemu-system-arm \
    -kernel zImage \
    -M virt -cpu cortex-a15 \
    -drive if=none,file=disk.img,cache=writeback,id=foo1 \
    -device virtio-blk-device,drive=foo1 \
    -object multi-socket-backend,id=foo,listen,path=ms_socket \
    -object memory-backend-shared,id=mem,size=1G,mem-path=/mnt/hugetlbfs,chardev=foo,master=on,prealloc=on  \
    -device idm_ipi,master=true,memdev=mem,socket=foo \
    -numa node,memdev=mem -m 1G \
    -append "root=/dev/vda rw console=ttyAMA0 mem=512M memmap=512M$0x60000000" \
    -nographic

run the slave instance

./arm-softmmu/qemu-system-arm\
    -M virt -cpu cortex-a15 -machine slave=on \
    -drive if=none,file=disk.img,cache=writeback,id=foo1 \
    -device virtio-blk-device,drive=foo1 \
    -object multi-socket-backend,id=foo,path=ms_socket \
    -object memory-backend-shared,id=mem,size=512M,mem-path=/mnt/hugetlbfs,chardev=foo,master=off \
    -device idm_ipi,master=false,memdev=mem,socket=foo \
    -incoming "shared:mem" -numa node,memdev=mem -m 512M \
    -nographic


For simplicity, use a disk image for the slave instead of a ramdisk.
                                                         
As visible from the kernel boot arguments, the master is booted with mem=512
so that one half of the whole memory allocated is not used by the master and
reserved for the slave. Such memory starts for the virt platform from
address 0x60000000.           

Once the master is booted the image of the kernel and DTB can be copied in the
memory carved out for the slave.                                              

In the maser console

probe the IDM kernel module:

$ insmod idm_test_mod.ko

run the application that copies the binaries into memory and triggers the boot:

$ ./load_bin_app 1 ./zImage ./slave.dtb
                                                            

On the slave console the linux kernel boot should be visible.

The present demo is intended only as a demonstration to see the patch-set at
work. In the near future, boot triggering, memory carveout and binary copy might
be implemented in a remoteproc driver coupled with a RPMSG driver for
communication between master and slave instance.


This work has been sponsored by Huawei Technologies Duesseldorf GmbH.

Baptiste Reynal (3):
  backend: multi-socket
  backend: shared memory backend
  migration: add shared migration type

Christian Pinto (5):
  hw/misc: IDM Device
  hw/arm: sysbus-fdt
  qemu: slave machine flag
  hw/arm: boot
  qemu: numa

 backends/Makefile.objs             |   4 +-
 backends/hostmem-shared.c          | 203 ++++++++++++++++++
 backends/multi-socket.c            | 353 +++++++++++++++++++++++++++++++
 default-configs/arm-softmmu.mak    |   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/arm/boot.c                      |  13 ++
 hw/arm/sysbus-fdt.c                |  60 ++++++
 hw/core/machine.c                  |  27 +++
 hw/misc/Makefile.objs              |   2 +
 hw/misc/idm.c                      | 416 +++++++++++++++++++++++++++++++++++++
 include/hw/boards.h                |   2 +
 include/hw/misc/idm.h              | 119 +++++++++++
 include/migration/migration.h      |   2 +
 include/qemu/multi-socket.h        | 124 +++++++++++
 include/sysemu/hostmem-shared.h    |  61 ++++++
 migration/Makefile.objs            |   2 +-
 migration/migration.c              |   2 +
 migration/shared.c                 |  32 +++
 numa.c                             |  17 +-
 qemu-options.hx                    |   5 +-
 util/qemu-config.c                 |   5 +
 22 files changed, 1448 insertions(+), 4 deletions(-)
 create mode 100644 backends/hostmem-shared.c
 create mode 100644 backends/multi-socket.c
 create mode 100644 hw/misc/idm.c
 create mode 100644 include/hw/misc/idm.h
 create mode 100644 include/qemu/multi-socket.h
 create mode 100644 include/sysemu/hostmem-shared.h
 create mode 100644 migration/shared.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 2/8] backend: shared memory backend Christian Pinto
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Baptiste Reynal, Jani.Kokkonen, tech, Claudio.Fontana,
	Christian Pinto

From: Baptiste Reynal <b.reynal@virtualopensystems.com>

This patch introduces a new socket for QEMU, called multi-socket. This
socket allows multiple QEMU instances to communicate by sharing messages
and file descriptors.

A socket can be instantiated with the following parameters:
-object multi-socket-backend,id=<id>,path=<socket_path>,listen=<on/off>

If listen is set, the socket will act as a listener and register new
clients.

Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 backends/Makefile.objs      |   2 +
 backends/multi-socket.c     | 353 ++++++++++++++++++++++++++++++++++++++++++++
 include/qemu/multi-socket.h | 124 ++++++++++++++++
 3 files changed, 479 insertions(+)
 create mode 100644 backends/multi-socket.c
 create mode 100644 include/qemu/multi-socket.h

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 31a3a89..689eac3 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -9,3 +9,5 @@ common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
 common-obj-$(CONFIG_LINUX) += hostmem-file.o
+
+common-obj-y += multi-socket.o
diff --git a/backends/multi-socket.c b/backends/multi-socket.c
new file mode 100644
index 0000000..36bee3a
--- /dev/null
+++ b/backends/multi-socket.c
@@ -0,0 +1,353 @@
+/*
+ * QEMU Multi Client socket
+ *
+ * Copyright (C) 2015 - Virtual Open Systems
+ *
+ * Author: Baptiste Reynal <b.reynal@virtualopensystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/multi-socket.h"
+
+typedef struct MSHandler {
+    char *name;
+    void (*read)(MSClient *client, const char *message, void *opaque);
+    void *opaque;
+
+    QLIST_ENTRY (MSHandler) next;
+} MSHandler;
+
+typedef struct MSRegHandler {
+    void (*reg)(MSClient *client, void *opaque);
+    void *opaque;
+
+    QLIST_ENTRY (MSRegHandler) next;
+} MSRegHandler;
+
+static void multi_socket_get_fds(MSClient *client, struct msghdr msg)
+{
+    struct cmsghdr *cmsg;
+
+    /* process fds */
+    for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+        int fd_size;
+
+        if (cmsg->cmsg_len < CMSG_LEN(sizeof(int)) ||
+                cmsg->cmsg_level != SOL_SOCKET ||
+                cmsg->cmsg_type != SCM_RIGHTS) {
+            continue;
+        }
+
+        fd_size = cmsg->cmsg_len - CMSG_LEN(0);
+
+        if (!fd_size) {
+            continue;
+        }
+
+        if (client->rcvfds) {
+            g_free(client->rcvfds);
+        }
+
+        client->rcvfds_num = fd_size / sizeof(int);
+        client->rcvfds = g_malloc(fd_size);
+        memcpy(client->rcvfds, CMSG_DATA(cmsg), fd_size);
+    }
+}
+
+static gboolean
+multi_socket_read_handler(GIOChannel *channel, GIOCondition cond, void *opaque)
+{
+    MSClient *client = (MSClient *) opaque;
+    MSBackend *backend = client->backend;
+
+    char message[BUFFER_SIZE];
+    struct MSHandler *h;
+
+    struct msghdr msg = { NULL, };
+    struct iovec iov[1];
+    union {
+        struct cmsghdr cmsg;
+        char control[CMSG_SPACE(sizeof(int) * MAX_FDS)];
+    } msg_control;
+    int flags = 0;
+    ssize_t ret;
+
+    iov[0].iov_base = message;
+    iov[0].iov_len = BUFFER_SIZE;
+
+    msg.msg_iov = iov;
+    msg.msg_iovlen = 1;
+    msg.msg_control = &msg_control;
+    msg.msg_controllen = sizeof(msg_control);
+
+    ret = recvmsg(client->fd, &msg, flags);
+
+    if (ret > 0) {
+        multi_socket_get_fds(client, msg);
+
+        /* handler callback */
+        QLIST_FOREACH (h, &backend->handlers, next) {
+            if (!strncmp(h->name, message, strlen(h->name))) {
+                h->read(client, message + strlen(h->name) + 1, h->opaque);
+                return TRUE;
+            }
+        }
+        printf("Unrecognized message: %s", message);
+    }
+
+    return FALSE;
+}
+
+void multi_socket_add_reg_handler(MSBackend *backend,
+        void (*reg)(MSClient *client, void *opaque), void *opaque)
+{
+    struct MSRegHandler *h;
+
+    h = g_malloc(sizeof(struct MSRegHandler));
+
+    h->reg = reg;
+    h->opaque = opaque;
+
+    QLIST_INSERT_HEAD(&backend->reg_handlers, h, next);
+}
+
+void multi_socket_add_handler(MSBackend *backend,
+        const char *name,
+        void (*read)(MSClient *c, const char *message, void *opaque),
+        void *opaque)
+{
+    struct MSHandler *h;
+
+    /* check that the handler name is not taken */
+    QLIST_FOREACH(h, &backend->handlers, next) {
+        if (!strcmp(h->name, name)) {
+            printf("Handler %s already exists.", name);
+            return;
+        }
+    }
+
+    h = g_malloc(sizeof(struct MSHandler));
+
+    h->name = g_strdup(name);
+    h->read = read;
+    h->opaque = opaque;
+
+    QLIST_INSERT_HEAD (&backend->handlers, h, next);
+}
+
+static void multi_socket_init_client(MSBackend *backend,
+        MSClient *client, int fd, GIOFunc handler)
+{
+    client->backend = backend;
+    client->fd = fd;
+    client->chan = g_io_channel_unix_new(fd);
+    client->tag = g_io_add_watch(client->chan, G_IO_IN, handler, client);
+
+    g_io_channel_set_encoding(client->chan, NULL, NULL);
+    g_io_channel_set_buffered(client->chan, FALSE);
+}
+
+int multi_socket_send_fds_to(MSClient *client, int *fds, int count,
+        const char *message, int size)
+{
+    struct msghdr msgh;
+    struct iovec iov;
+    int r;
+
+    size_t fd_size = count * sizeof(int);
+    char control[CMSG_SPACE(fd_size)];
+    struct cmsghdr *cmsg;
+
+    memset(&msgh, 0, sizeof(msgh));
+    memset(control, 0, sizeof(control));
+
+    /* set the payload */
+    iov.iov_base = (uint8_t *) message;
+    iov.iov_len = size;
+
+    msgh.msg_iov = &iov;
+    msgh.msg_iovlen = 1;
+
+    msgh.msg_control = control;
+    msgh.msg_controllen = sizeof(control);
+
+    cmsg = CMSG_FIRSTHDR(&msgh);
+
+    cmsg->cmsg_len = CMSG_LEN(fd_size);
+    cmsg->cmsg_level = SOL_SOCKET;
+    cmsg->cmsg_type = SCM_RIGHTS;
+    memcpy(CMSG_DATA(cmsg), fds, fd_size);
+
+    do {
+        r = sendmsg(client->fd, &msgh, 0);
+    } while (r < 0 && errno == EINTR);
+
+    return r;
+}
+
+int multi_socket_write_to(MSClient *client, const char *message, int size)
+{
+    return multi_socket_send_fds_to(client, 0, 0, message, size);
+}
+
+int multi_socket_get_fds_num_from(MSClient *client)
+{
+    return client->rcvfds_num;
+}
+
+int multi_socket_get_fds_from(MSClient *client, int *fds)
+{
+    memcpy(fds, client->rcvfds, client->rcvfds_num * sizeof(int));
+
+    return client->rcvfds_num;
+}
+
+static void multi_socket_add_client(MSBackend *backend, int fd)
+{
+    MSClient *c = g_malloc(sizeof(MSClient));
+    MSRegHandler *h;
+
+    multi_socket_init_client(backend, c, fd, multi_socket_read_handler);
+    QLIST_FOREACH(h, &backend->reg_handlers, next) {
+        h->reg(c, h->opaque);
+    }
+
+    QLIST_INSERT_HEAD(&backend->clients, c, next);
+}
+
+static gboolean
+multi_socket_accept(GIOChannel *channel, GIOCondition cond, void *opaque)
+{
+    MSClient *client = (MSClient *) opaque;
+    MSBackend *backend = client->backend;
+
+    struct sockaddr_un uaddr;
+    socklen_t len;
+    int fd;
+
+    len = sizeof(uaddr);
+
+    fd = qemu_accept(backend->listener.fd, (struct sockaddr *) &uaddr, &len);
+
+    if (fd > 0) {
+        multi_socket_add_client(backend, fd);
+        return true;
+    } else {
+        perror("Error creating socket.");
+        return false;
+    }
+}
+
+static void multi_socket_init_socket(MSBackend *backend)
+{
+    int fd;
+
+    backend->addr = g_new0(SocketAddress, 1);
+    backend->addr->kind = SOCKET_ADDRESS_KIND_UNIX;
+    backend->addr->q_unix = g_new0(UnixSocketAddress, 1);
+    /* TODO change name with real path */
+    backend->addr->q_unix->path = g_strdup(backend->path);
+
+    if (backend->listen) {
+        fd = socket_listen(backend->addr, NULL);
+
+        if (fd < 0) {
+            perror("Error: Impossible to open socket.");
+            exit(-1);
+        }
+
+        multi_socket_init_client(backend, &backend->listener, fd,
+                multi_socket_accept);
+    } else {
+        fd = socket_connect(backend->addr, NULL, NULL, NULL);
+
+        if (fd < 0) {
+            perror("Error: Unavailable socket server.");
+            exit(-1);
+        }
+
+        multi_socket_init_client(backend, &backend->listener,
+                fd, multi_socket_read_handler);
+    }
+}
+
+static void multi_socket_backend_complete(UserCreatable *uc, Error **errp)
+{
+    MSBackend *backend = MULTI_SOCKET_BACKEND(uc);
+
+    QLIST_INIT(&backend->clients);
+    QLIST_INIT(&backend->handlers);
+
+    multi_socket_init_socket(backend);
+}
+
+static void multi_socket_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+
+    ucc->complete = multi_socket_backend_complete;
+}
+
+static bool multi_socket_backend_get_listen(Object *o, Error **errp)
+{
+    MSBackend *backend = MULTI_SOCKET_BACKEND(o);
+
+    return backend->listen;
+}
+
+static void multi_socket_backend_set_listen(Object *o, bool value, Error **errp)
+{
+    MSBackend *backend = MULTI_SOCKET_BACKEND(o);
+
+    backend->listen = value;
+}
+
+static char *multi_socket_get_path(Object *o, Error **errp)
+{
+    MSBackend *backend = MULTI_SOCKET_BACKEND(o);
+
+    return g_strdup(backend->path);
+}
+
+static void multi_socket_set_path(Object *o, const char *str, Error **errp)
+{
+    MSBackend *backend = MULTI_SOCKET_BACKEND(o);
+
+    if (str == NULL || str[0] == 0) {
+        perror("Error: Socket path is empty.");
+        exit(-1);
+    }
+
+    backend->path = g_strdup(str);
+}
+
+static void multi_socket_instance_init(Object *o)
+{
+    object_property_add_bool(o, "listen",
+                        multi_socket_backend_get_listen,
+                        multi_socket_backend_set_listen, NULL);
+    object_property_add_str(o, "path", multi_socket_get_path,
+                        multi_socket_set_path, NULL);
+}
+
+static const TypeInfo multi_socket_backend_info = {
+    .name = TYPE_MULTI_SOCKET_BACKEND,
+    .parent = TYPE_OBJECT,
+    .class_size = sizeof(MSBackendClass),
+    .class_init = multi_socket_class_init,
+    .instance_size = sizeof(MSBackend),
+    .instance_init = multi_socket_instance_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&multi_socket_backend_info);
+}
+
+type_init(register_types);
diff --git a/include/qemu/multi-socket.h b/include/qemu/multi-socket.h
new file mode 100644
index 0000000..dee866a
--- /dev/null
+++ b/include/qemu/multi-socket.h
@@ -0,0 +1,124 @@
+/*
+ * QEMU Multi Client socket
+ *
+ * Copyright (C) 2015 - Virtual Open Systems
+ *
+ * Author: Baptiste Reynal <b.reynal@virtualopensystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+#ifndef QEMU_MS_H
+#define QEMU_MS_H
+
+#include "qemu-common.h"
+#include "qemu/queue.h"
+#include "qemu/sockets.h"
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+
+#define TYPE_MULTI_SOCKET_BACKEND "multi-socket-backend"
+#define MULTI_SOCKET_BACKEND(obj) \
+    OBJECT_CHECK(MSBackend, (obj), TYPE_MULTI_SOCKET_BACKEND)
+#define MULTI_SOCKET_BACKEND_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MSBackendClass, (obj), TYPE_MULTI_SOCKET_BACKEND)
+#define MULTI_SOCKET_BACKEND_CLASS(klass) \
+    OBJECT_CLASS_CHECK(MSBackendClass, (klass), TYPE_MULTI_SOCKET_BACKEND)
+
+#define MAX_FDS     32
+#define BUFFER_SIZE 32
+
+typedef struct MSBackend MSBackend;
+typedef struct MSBackendClass MSBackendClass;
+typedef struct MSClient MSClient;
+
+struct MSClient {
+    /* private */
+    int fd;
+    MSBackend *backend;
+    GIOChannel *chan;
+    guint tag;
+
+    int *rcvfds;
+    int rcvfds_num;
+
+    QLIST_ENTRY(MSClient) next;
+};
+
+struct MSBackendClass {
+    /* private */
+    ObjectClass parent_class;
+};
+
+struct MSBackend {
+    /* private */
+    Object parent;
+
+    /* protected */
+    char *path;
+    SocketAddress *addr;
+
+    QLIST_HEAD(clients_head, MSClient) clients;
+    QLIST_HEAD(reg_handlers_head, MSRegHandler) reg_handlers;
+    QLIST_HEAD(handlers_head, MSHandler) handlers;
+
+    bool listen;
+    MSClient listener;
+};
+
+/* Callback method called each time a client is registered to the server.
+ * @backend: socket server
+ * @reg: callback function
+ * @opaque: optionnal data passed to register function
+ */
+void multi_socket_add_reg_handler(MSBackend *backend,
+        void (*reg)(MSClient *client, void *opaque),
+        void *opaque);
+
+/* Attach a handler to the socket. "read" function will be called if a string
+ * begining with "name" is received over the socket. Payload can be attached
+ * next to name and will be passed to the "read" function as "message"
+ * parameter.
+ *
+ * @backend: multi-client socket
+ * @name: name of the handler (should be unique for the socket)
+ * @read: callback function
+ * @opaque:optionnal datas passed to read function
+ */
+void multi_socket_add_handler(MSBackend *backend, const char *name,
+        void (*read)(MSClient *client, const char *message, void *opaque),
+        void *opaque);
+
+/* Send file descriptors over the socket.
+ *
+ * @client: client to whom send the message
+ * @fds: file descriptors array to send
+ * @count: size of the array
+ * @message: attached message
+ * @size: size of the message
+ */
+int multi_socket_send_fds_to(MSClient *client, int *fds, int count,
+        const char *message, int size);
+
+/* Send message over the socket
+ *
+ * @client: client to whom send the message
+ * @message: attached message
+ * @size: size of the message
+ */
+int multi_socket_write_to(MSClient *client, const char *message, int size);
+
+/* Get fds size received with the last message.
+ *
+ * @client: client who sent the message
+ */
+int multi_socket_get_fds_num_from(MSClient *client);
+
+/* Get the fds received with the last message.
+ *
+ * @client: client who sent the message
+ * @fds: int array to fill with the fds
+ */
+int multi_socket_get_fds_from(MSClient *client, int *fds);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 2/8] backend: shared memory backend
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 3/8] migration: add shared migration type Christian Pinto
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Baptiste Reynal, Jani.Kokkonen, tech, Claudio.Fontana,
	Christian Pinto

From: Baptiste Reynal <b.reynal@virtualopensystems.com>

This patch introduces a shared memory backend, allowing to share memory
between a master and many slaves.

The memory is implemented using hugetlbfs, and relies on the
multi-socket backend to share informations (size and offset for the
slaves).

Instantiation on the master:
-object memory-backend-shared,id=<id>,size=<memory_sizeK,M,G>,
	chardev=<multi-socket_id>,master=on

Instantiation on the slave:
-object memory-backend-shared,id=<id>,size=<slave_memory_sizeK,M,G>,
	chardev=<multi-socket_id>,master=off

Memory size on the slave can be smaller than on master. The master will
send to the slave the size and the offset of the memory segment it can
allocate in the total shared memory.

Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 backends/Makefile.objs          |   2 +-
 backends/hostmem-shared.c       | 194 ++++++++++++++++++++++++++++++++++++++++
 include/sysemu/hostmem-shared.h |  61 +++++++++++++
 3 files changed, 256 insertions(+), 1 deletion(-)
 create mode 100644 backends/hostmem-shared.c
 create mode 100644 include/sysemu/hostmem-shared.h

diff --git a/backends/Makefile.objs b/backends/Makefile.objs
index 689eac3..de76906 100644
--- a/backends/Makefile.objs
+++ b/backends/Makefile.objs
@@ -8,6 +8,6 @@ baum.o-cflags := $(SDL_CFLAGS)
 common-obj-$(CONFIG_TPM) += tpm.o
 
 common-obj-y += hostmem.o hostmem-ram.o
-common-obj-$(CONFIG_LINUX) += hostmem-file.o
+common-obj-$(CONFIG_LINUX) += hostmem-file.o hostmem-shared.o
 
 common-obj-y += multi-socket.o
diff --git a/backends/hostmem-shared.c b/backends/hostmem-shared.c
new file mode 100644
index 0000000..a96ccdf
--- /dev/null
+++ b/backends/hostmem-shared.c
@@ -0,0 +1,194 @@
+/*
+ * QEMU Host Memory Backend for hugetlbfs
+ *
+ * Copyright (C) 2015 - Virtual Open Systems
+ *
+ * Author: Baptiste Reynal <b.reynal@virtualopensystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "sysemu/hostmem-shared.h"
+
+static void shared_backend_init_shm(HostMemoryBackendShared *shm, int shmd,
+        size_t size, off_t offset) {
+    void *shared_ram;
+    HostMemoryBackend *backend = MEMORY_BACKEND(shm);
+
+    shared_ram = mmap(0, size, PROT_READ | PROT_WRITE,
+            MAP_SHARED, shmd, offset);
+    close(shmd);
+
+    if (shared_ram == MAP_FAILED)
+        perror("Map failed");
+
+    memory_region_init_ram_ptr(&shm->shared_region, OBJECT(backend),
+            "shared_mem", size, shared_ram);
+
+    memory_region_add_subregion(&backend->mr,
+            0, &shm->shared_region);
+}
+
+/* Callback function if a fd is received over the socket */
+static void set_shared_memory(MSClient *c, const char *message, void *opaque)
+{
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(opaque);
+    uint32_t *infos = (uint32_t *) message;
+
+    int fd = 0;
+    multi_socket_get_fds_from(c, &fd);
+
+    if (fd <= 0) {
+        printf("Error receiving fd: %d", fd);
+        exit(-1);
+    }
+
+    shared_backend_init_shm(shm, fd, infos[0], infos[1]);
+}
+
+static void
+shared_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(backend);
+
+    if (!backend->size) {
+        error_setg(errp, "can't create backend with size 0");
+        return;
+    }
+
+    shm->ms = (MSBackend *) object_resolve_path_type(shm->chardev,
+            TYPE_MULTI_SOCKET_BACKEND, NULL);
+    if (shm->ms == NULL) {
+        printf("Error: Cannot find socket %s\n", shm->chardev);
+        exit(-1);
+    }
+
+    if (!memory_region_size(&backend->mr)) {
+        if (shm->master) {
+            backend->force_prealloc = mem_prealloc;
+            memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
+                    object_get_canonical_path(OBJECT(backend)),
+                    backend->size, true,
+                    shm->mem_path, errp);
+        } else {
+            backend->force_prealloc = mem_prealloc;
+
+            /*
+             * Initialize only the main fields
+             * the rest is initialized when the fd is received
+            */
+            memory_region_init(&backend->mr, OBJECT(backend),
+                    object_get_canonical_path(OBJECT(backend)),
+                    backend->size);
+
+            multi_socket_add_handler(shm->ms, "send_fd",
+                    set_shared_memory, shm);
+        }
+    }
+}
+
+static void
+shared_memory_backend_complete(UserCreatable *uc, Error **errp)
+{
+    HostMemoryBackend *hm = MEMORY_BACKEND(uc);
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(uc);
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_GET_CLASS(uc);
+    HostMemoryBackendSharedClass *bsc = MEMORY_BACKEND_SHARED_GET_CLASS(uc);
+
+    if (shm->master)
+        bsc->parent_complete(uc, errp);
+    else
+        bc->alloc(hm, errp);
+}
+
+static void shared_backend_class_init(ObjectClass *oc, void *data)
+{
+    UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+    HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
+    HostMemoryBackendSharedClass *bsc = MEMORY_BACKEND_SHARED_CLASS(oc);
+
+    bc->alloc = shared_backend_memory_alloc;
+    bsc->parent_complete = ucc->complete;
+    ucc->complete = shared_memory_backend_complete;
+}
+
+static char *get_mem_path(Object *o, Error **errp)
+{
+    HostMemoryBackendShared *backend = MEMORY_BACKEND_SHARED(o);
+
+    return g_strdup(backend->mem_path);
+}
+
+static void set_mem_path(Object *o, const char *str, Error **errp)
+{
+    HostMemoryBackend *backend = MEMORY_BACKEND(o);
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(o);
+
+    if (memory_region_size(&backend->mr)) {
+        error_setg(errp, "cannot change property value");
+        return;
+    }
+    if (shm->mem_path) {
+        g_free(shm->mem_path);
+    }
+    shm->mem_path = g_strdup(str);
+}
+
+static bool get_master(Object *o, Error **errp)
+{
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(o);
+
+    return shm->master;
+}
+
+static void set_master(Object *o, bool value, Error **errp)
+{
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(o);
+
+    shm->master = value;
+}
+
+static char *get_chardev(Object *o, Error **errp)
+{
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(o);
+
+    return g_strdup(shm->chardev);
+}
+
+static void set_chardev(Object *o, const char *str, Error **errp)
+{
+    HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(o);
+
+    if (shm->chardev) {
+        g_free(shm->chardev);
+    }
+    shm->chardev = g_strdup(str);
+}
+
+static void shared_backend_instance_init(Object *o)
+{
+    object_property_add_bool(o, "master", get_master,
+                        set_master, NULL);
+    object_property_add_str(o, "mem-path", get_mem_path,
+                            set_mem_path, NULL);
+    object_property_add_str(o, "chardev", get_chardev,
+                            set_chardev, NULL);
+}
+
+static const TypeInfo shared_backend_info = {
+    .name = TYPE_MEMORY_BACKEND_SHARED,
+    .parent = TYPE_MEMORY_BACKEND,
+    .class_init = shared_backend_class_init,
+    .class_size = sizeof(HostMemoryBackendSharedClass),
+    .instance_init = shared_backend_instance_init,
+    .instance_size = sizeof(HostMemoryBackendShared),
+};
+
+static void register_types(void)
+{
+    type_register_static(&shared_backend_info);
+}
+
+type_init(register_types);
diff --git a/include/sysemu/hostmem-shared.h b/include/sysemu/hostmem-shared.h
new file mode 100644
index 0000000..f3f8e4e
--- /dev/null
+++ b/include/sysemu/hostmem-shared.h
@@ -0,0 +1,61 @@
+/*
+ * QEMU Host Memory Backend for hugetlbfs
+ *
+ * Copyright (C) 2015 - Virtual Open Systems
+ *
+ * Author: Baptiste Reynal <b.reynal@virtualopensystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_HM_H
+#define QEMU_HM_H
+
+#include "qemu-common.h"
+#include "sysemu/hostmem.h"
+#include "sysemu/sysemu.h"
+#include "qemu/multi-socket.h"
+#include "qom/object_interfaces.h"
+#include "qapi-visit.h"
+
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <sys/statfs.h>
+#include <fcntl.h>
+
+typedef struct HostMemoryBackendShared {
+    HostMemoryBackend parent_obj;
+
+    bool master;
+
+    char *mem_path;
+    char *chardev;
+
+    int event;
+    EventNotifier *levent;
+
+    MSBackend *ms;
+    MemoryRegion shared_region;
+} HostMemoryBackendShared;
+
+typedef struct HostMemoryBackendSharedClass {
+    HostMemoryBackendClass parent_class;
+
+    void (*parent_complete)(UserCreatable *uc, Error **errp);
+} HostMemoryBackendSharedClass;
+
+#define TYPE_MEMORY_BACKEND_SHARED "memory-backend-shared"
+
+#define MEMORY_BACKEND_SHARED(obj) \
+    OBJECT_CHECK(HostMemoryBackendShared, (obj), TYPE_MEMORY_BACKEND_SHARED)
+#define MEMORY_BACKEND_SHARED_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(HostMemoryBackendSharedClass, (obj), \
+            TYPE_MEMORY_BACKEND_SHARED)
+#define MEMORY_BACKEND_SHARED_CLASS(klass) \
+    OBJECT_CLASS_CHECK(HostMemoryBackendSharedClass, (klass), \
+            TYPE_MEMORY_BACKEND_SHARED)
+#define IS_MEMORY_BACKEND_SHARED(obj) \
+    object_dynamic_cast (OBJECT(obj), TYPE_MEMORY_BACKEND_SHARED)
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 3/8] migration: add shared migration type
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 2/8] backend: shared memory backend Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 4/8] hw/misc: IDM Device Christian Pinto
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Baptiste Reynal, Jani.Kokkonen, tech, Claudio.Fontana,
	Christian Pinto

From: Baptiste Reynal <b.reynal@virtualopensystems.com>

A QEMU instance can now wait for the instantiation of the memory by the
master while shared-memory backend is used.

Use:
-incoming "shared:<shared-memory_id>"

Signed-off-by: Baptiste Reynal <b.reynal@virtualopensystems.com>
---
 backends/hostmem-shared.c     |  9 +++++++++
 include/migration/migration.h |  2 ++
 migration/Makefile.objs       |  2 +-
 migration/migration.c         |  2 ++
 migration/shared.c            | 32 ++++++++++++++++++++++++++++++++
 5 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 migration/shared.c

diff --git a/backends/hostmem-shared.c b/backends/hostmem-shared.c
index a96ccdf..0e79019 100644
--- a/backends/hostmem-shared.c
+++ b/backends/hostmem-shared.c
@@ -11,6 +11,7 @@
  */
 
 #include "sysemu/hostmem-shared.h"
+#include "migration/vmstate.h"
 
 static void shared_backend_init_shm(HostMemoryBackendShared *shm, int shmd,
         size_t size, off_t offset) {
@@ -29,6 +30,8 @@ static void shared_backend_init_shm(HostMemoryBackendShared *shm, int shmd,
 
     memory_region_add_subregion(&backend->mr,
             0, &shm->shared_region);
+
+    vmstate_register_ram_global(&shm->shared_region);
 }
 
 /* Callback function if a fd is received over the socket */
@@ -46,6 +49,7 @@ static void set_shared_memory(MSClient *c, const char *message, void *opaque)
     }
 
     shared_backend_init_shm(shm, fd, infos[0], infos[1]);
+    event_notifier_set(shm->levent);
 }
 
 static void
@@ -87,6 +91,11 @@ shared_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
                     set_shared_memory, shm);
         }
     }
+
+    shm->levent = g_new(EventNotifier, 1);
+    event_notifier_init(shm->levent, 0);
+
+    shm->event = event_notifier_get_fd(shm->levent);
 }
 
 static void
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 8334621..0d4efa5 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -114,6 +114,8 @@ void migrate_fd_connect(MigrationState *s);
 
 int migrate_fd_close(MigrationState *s);
 
+void shared_start_incoming_migration(const char *name, Error **errp);
+
 void add_migration_state_change_notifier(Notifier *notify);
 void remove_migration_state_change_notifier(Notifier *notify);
 bool migration_in_setup(MigrationState *);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d929e96..08c96f7 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -7,4 +7,4 @@ common-obj-$(CONFIG_RDMA) += rdma.o
 common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
 
 common-obj-y += block.o
-
+common-obj-y += shared.o
diff --git a/migration/migration.c b/migration/migration.c
index 662e77e..9f68983 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -249,6 +249,8 @@ void qemu_start_incoming_migration(const char *uri, Error **errp)
         deferred_incoming_migration(errp);
     } else if (strstart(uri, "tcp:", &p)) {
         tcp_start_incoming_migration(p, errp);
+    } else if (strstart(uri, "shared:", &p)) {
+	shared_start_incoming_migration(p, errp);
 #ifdef CONFIG_RDMA
     } else if (strstart(uri, "rdma:", &p)) {
         rdma_start_incoming_migration(p, errp);
diff --git a/migration/shared.c b/migration/shared.c
new file mode 100644
index 0000000..fc3ee08
--- /dev/null
+++ b/migration/shared.c
@@ -0,0 +1,32 @@
+#include "qemu-common.h"
+#include "qemu/main-loop.h"
+#include "qemu/sockets.h"
+#include "migration/migration.h"
+#include "monitor/monitor.h"
+#include "migration/qemu-file.h"
+#include "block/block.h"
+#include "sysemu/hostmem-shared.h"
+
+static void shared_accept_incoming_migration(void *opaque) {
+    QEMUFile *f = opaque;
+    printf("Start !\n");
+
+    qemu_set_fd_handler(qemu_get_fd(f), NULL, NULL, NULL);
+    vm_start();
+}
+
+void shared_start_incoming_migration(const char *id, Error **errp)
+{
+    HostMemoryBackendShared *shm = (HostMemoryBackendShared *)
+        object_resolve_path_type(id, TYPE_MEMORY_BACKEND_SHARED, NULL);
+    QEMUFile *f;
+
+    if (shm == NULL) {
+        printf("Error: Cannot find shared memory %s\n", id);
+        exit(-1);
+    }
+
+    f = qemu_fdopen(shm->event, "rb");
+
+    qemu_set_fd_handler(shm->event, shared_accept_incoming_migration, NULL, f);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 4/8] hw/misc: IDM Device
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
                   ` (2 preceding siblings ...)
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 3/8] migration: add shared migration type Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 5/8] hw/arm: sysbus-fdt Christian Pinto
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jani.Kokkonen, tech, Claudio.Fontana, Christian Pinto

This patch introduces the Interrupt Distribution Module (IDM) device for ARM,
x86 and X86_64 architectures, as the only currently supported ones.

The IDM device is used both as and Inter-processor interrupt routing device,
and to trigger the boot of a slave QEMU instance.

The IDM device can be instantiated either as a sysbus or PCI device.

 -device idm_ipi

or

 -device idm_ipi_pci

parameters are:
 master=[true/false] - configure the IDM device as master or slave
 num_slaves=[slaves_number] - if master is true specifies the number of slaves
 memdev=[memdev_id] - id of the shared memory backend
 socket=[socket_id] - id of the multi-client socket

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
---
 default-configs/arm-softmmu.mak    |   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/misc/Makefile.objs              |   2 +
 hw/misc/idm.c                      | 416 +++++++++++++++++++++++++++++++++++++
 include/hw/misc/idm.h              | 119 +++++++++++
 6 files changed, 540 insertions(+)
 create mode 100644 hw/misc/idm.c
 create mode 100644 include/hw/misc/idm.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index d9b90a5..44109a3 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -109,3 +109,4 @@ CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_ACPI=y
 CONFIG_SMBIOS=y
+CONFIG_IDM=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 9393cf0..f017448 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -51,3 +51,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_IDM=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 28e2099..6977479 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -49,5 +49,6 @@ CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_IDM=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 4aa76ff..6e01c50 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -21,6 +21,8 @@ common-obj-$(CONFIG_MACIO) += macio/
 
 obj-$(CONFIG_IVSHMEM) += ivshmem.o
 
+obj-$(CONFIG_IDM) += idm.o
+
 obj-$(CONFIG_REALVIEW) += arm_sysctl.o
 obj-$(CONFIG_NSERIES) += cbus.o
 obj-$(CONFIG_ECCMEMCTL) += eccmemctl.o
diff --git a/hw/misc/idm.c b/hw/misc/idm.c
new file mode 100644
index 0000000..a8f408c
--- /dev/null
+++ b/hw/misc/idm.c
@@ -0,0 +1,416 @@
+/*
+ * IDM Device
+ *
+ * Copyright (C) 2015 - Virtual Open Systems
+ *
+ * Author: Christian Pinto <c.pinto@virtualopensystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw/sysbus.h"
+#include "qemu/error-report.h"
+#include "hw/misc/idm.h"
+
+
+static void inject_irq(IDMState *s)
+{
+    if (s->pci) {
+        PCIDevice *d = PCI_DEVICE(s);
+        pci_set_irq(d, 1);
+    } else {
+        qemu_irq_raise(s->irq);
+    }
+}
+
+static void reset_irq(IDMState *s)
+{
+    if (s->pci) {
+        PCIDevice *d = PCI_DEVICE(s);
+        pci_set_irq(d, 0);
+    } else {
+        qemu_irq_lower(s->irq);
+    }
+}
+
+static uint64_t idm_read(void *opaque, hwaddr offset, unsigned size)
+{
+    struct IDMState *s = opaque;
+    uint64_t ret = 0;
+
+    IDM_PRINTF("idm_read - offset %llu, size: %u\n",
+        (unsigned long long int)offset, size);
+
+    switch (offset) {
+    case ISR_REG:
+        /**
+         * Reading the ISR returns the whole mask, so that the driver can
+         * figure out which slaves have fired.
+         * The interrupt status register is cleared, and the interrupt lowered
+         */
+        ret = s->int_status_reg;
+        s->int_status_reg = 0;
+        reset_irq(s);
+        break;
+    default:
+        error_report("idm_read: Wrong offset");
+        break;
+    }
+
+    return ret;
+}
+
+static void send_shmem_fd(IDMState *s, MSClient *c)
+{
+    int fd, len;
+    uint32_t *message;
+    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
+
+    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
+    message = malloc(len * sizeof(uint32_t));
+    strcpy((char *) message, SEND_MEM_FD_CMD);
+    message[len - 2] = s->pboot_size;
+    message[len - 1] = s->pboot_offset;
+
+    fd = memory_region_get_fd(&backend->mr);
+
+    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len * sizeof(uint32_t));
+
+    free(message);
+}
+
+static void idm_write(void *opaque, hwaddr offset, uint64_t value,
+    unsigned size)
+{
+    struct IDMState *s = opaque;
+
+    IDM_PRINTF("idm_write: offset %llu, size: %u, value: %llu\n",
+        (unsigned long long int)offset, size, (unsigned long long int)value);
+
+    switch (offset) {
+    case KICK_REG:
+    /**
+     * To kick another qemu instance it is sufficient to write its id into
+     * the KICK_REG. Value zero is reserved to kick the master,
+     * from 1 upwards for the slaves.
+     */
+        if (value == 0) {
+            if (!s->master) {
+                event_notifier_set(&s->master_eventfd->eventfd.out);
+                IDM_PRINTF("idm_write: triggered eventfd to master\n");
+            } else {
+                error_report("IDM module: master instance trying to kick "
+                    "the master. Wrong value written to Kick_REG");
+            }
+        } else {
+            if (s->master) {
+                event_notifier_set(&s->slaves[value - 1].eventfd.out);
+                IDM_PRINTF("idm_write: triggered eventfd to slave %llu\n",
+                    (unsigned long long int)value - 1);
+            } else {
+                error_report("IDM module: Slave trying to kick another slave."
+                    "Functionality not yet supported");
+            }
+        }
+        break;
+    case BOOT_REG:
+    /**
+     * When idm is configured as master the BOOT_REG is used to trigger
+     * the boot of a slave instance.
+     * The ID of the slave to boot is to be written in BOOT_REG
+     * Slave IDs range from 0 up-to num_slaves - 1
+     */
+        IDM_PRINTF("idm_write: triggering boot of slave %d\n",
+            (int)(value - 1));
+        send_shmem_fd(s, s->slaves[value - 1].socket_client);
+        break;
+    case PBOOT_REG:
+        if (value == PBOOT_REG_RESET)
+            s->pboot_status = 0;
+        else if (s->pboot_status == 0) {
+            s->pboot_size = value;
+            s->pboot_status = 1;
+        } else {
+            s->pboot_offset = value;
+            s->pboot_status = 0;
+        }
+        break;
+    default:
+        error_report("IDM module: wrong register to idm_write\n");
+        break;
+    }
+}
+
+static void idm_eventfd_handler(void *opaque)
+{
+    struct IDMSlave *sl = opaque;
+    struct IDMState *s = sl->idm_state;
+
+    IDM_PRINTF("idm_eventfd_handler: triggering IRQ to GuestOS\n");
+
+
+    /**
+     * Set the interrupt status register to notify which slave has fired
+     * the interrupt
+     */
+    s->int_status_reg |= 1 << (sl->slave_id + 1);
+
+    event_notifier_test_and_clear(&sl->eventfd.in);
+    inject_irq(s);
+}
+
+static void slave_register_ms_handler(MSClient *c, const char *message,
+    void *opaque)
+{
+    struct IDMState *s = opaque;
+    int fd[2], ret, id;
+
+    if (s->registered_slaves >= s->num_slaves) {
+        error_report("IDM module: All slaves already registered, "
+            "run again with bigger num_slaves");
+        exit(1);
+    }
+
+    id = s->registered_slaves;
+    s->registered_slaves++;
+
+    s->slaves[id].slave_id = id;
+    s->slaves[id].idm_state = s;
+    s->slaves[id].socket_client = c;
+    ret = event_notifier_init(&s->slaves[id].eventfd.in, 0);
+    ret |= event_notifier_init(&s->slaves[id].eventfd.out, 0);
+    if (ret) {
+        error_report("IDM module: Unable to initialize local eventfd");
+        exit(1);
+    }
+
+    /**
+     * send master eventfd to slave
+     * Set master eventfd into socket ancillary data and send
+     */
+
+    fd[0] = event_notifier_get_fd(&s->slaves[id].eventfd.in);
+    fd[1] = event_notifier_get_fd(&s->slaves[id].eventfd.out);
+
+
+    qemu_set_fd_handler(fd[0], (IOHandler *)idm_eventfd_handler, NULL,
+        &s->slaves[id]);
+
+    IDM_PRINTF("slave_register_ms_handler: master sending eventfds %d - %d "
+        "to slave %d\n", fd[0], fd[1], id);
+
+    multi_socket_send_fds_to(c, fd, 2, MASTER_EVENTFD_CMD,
+        strlen(MASTER_EVENTFD_CMD) + 1);
+}
+
+static void master_eventfd_ms_handler(MSClient *c, const char *message,
+        void *opaque)
+{
+    struct IDMState *s = opaque;
+    int fd[2];
+
+    s->master_eventfd->idm_state = s;
+    /**
+     * Master has the highest id
+     */
+    s->master_eventfd->slave_id = 0xffffffff;
+    /**
+     * Get slave eventfd from socket ancillary data
+     */
+    multi_socket_get_fds_from(c, fd);
+    IDM_PRINTF("master_eventfd_ms_handler: eventfd %d - %d "
+        "received from master\n", fd[0], fd[1]);
+
+    /**
+     * Initialize master event notifier
+     */
+    event_notifier_init_fd(&s->master_eventfd->eventfd.in, fd[1]);
+    event_notifier_init_fd(&s->master_eventfd->eventfd.out, fd[0]);
+
+    qemu_set_fd_handler(fd[1], (IOHandler *)idm_eventfd_handler, NULL,
+        s->master_eventfd);
+}
+
+
+static const MemoryRegionOps idm_ops = {
+    .read = idm_read,
+    .write = idm_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static void idm_realize_common(DeviceState *dev, Error **errp, IDMState *s)
+{
+    if (s->master && (s->num_slaves == 0)) {
+        error_report("idm_realize_common: master requires at least one slave");
+        exit(1);
+    }
+
+    if (s->master) {
+        IDM_PRINTF("idm_realize_common: Master init, num slaves %d\n",
+            s->num_slaves);
+        multi_socket_add_handler(s->master_socket, SLAVE_REGISTER_CMD,
+            slave_register_ms_handler, s);
+        s->slaves = g_new(struct IDMSlave, s->num_slaves);
+        s->registered_slaves = 0;
+    } else {
+        IDM_PRINTF("idm_realize_common: Slave init\n");
+        s->master_eventfd = g_new(struct IDMSlave, 1);
+        multi_socket_add_handler(s->master_socket, MASTER_EVENTFD_CMD,
+            master_eventfd_ms_handler, s);
+        multi_socket_write_to(&s->master_socket->listener, SLAVE_REGISTER_CMD,
+            strlen(SLAVE_REGISTER_CMD) + 1);
+    }
+
+    IDM_PRINTF("idm_realize_common: done!!\n");
+}
+
+static void idm_realize(DeviceState *dev, Error **errp)
+{
+    struct IDMState *s = IDM(dev);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+
+    s->pci = false;
+
+    IDM_PRINTF("idm_realize\n");
+
+    /**
+     * Initialize MMIO regions with read/write functions
+     */
+    memory_region_init_io(&s->iomem, OBJECT(s), &idm_ops, s,
+                          TYPE_IDM, IDM_SIZE);
+    sysbus_init_mmio(sbd, &s->iomem);
+
+    /**
+     * Initialize IRQ
+     */
+    sysbus_init_irq(sbd, &s->irq);
+
+    idm_realize_common(dev, errp, IDM(dev));
+}
+
+static void idm_realize_pci(PCIDevice *dev, Error **errp)
+{
+    struct IDMState *s = IDM_PCI(dev);
+    uint8_t *pci_conf;
+
+    IDM_PRINTF("idm_realize_pci\n");
+
+    s->pci = true;
+
+    pci_conf = dev->config;
+    pci_conf[PCI_COMMAND] = PCI_COMMAND_MEMORY;
+
+    /**
+     * Initialize IRQ
+     */
+    pci_config_set_interrupt_pin(pci_conf, 1);
+
+    /**
+     * Initialize MMIO regions with read/write functions
+     */
+    memory_region_init_io(&s->iomem, OBJECT(s), &idm_ops, s,
+                          TYPE_IDM_PCI, IDM_SIZE);
+    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     &s->iomem);
+
+    idm_realize_common(&dev->qdev, errp, IDM_PCI(dev));
+}
+
+
+static Property idm_properties[] = {
+    DEFINE_PROP_UINT32("num-slaves", IDMState, num_slaves, 1),
+    DEFINE_PROP_BOOL("master", IDMState, master, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+
+static void idm_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    IDM_PRINTF("idm_class_init\n");
+
+    dc->props = idm_properties;
+    dc->realize = idm_realize;
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+
+static void idm_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    IDM_PRINTF("idm_pci_class_init\n");
+
+    dc->props = idm_properties;
+    k->realize = idm_realize_pci;
+
+    k->vendor_id = PCI_VENDOR_ID_IDM;
+    k->device_id = PCI_DEVICE_ID_IDM;
+    k->class_id = PCI_CLASS_MEMORY_RAM;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+
+static void idm_init(Object *obj)
+{
+    struct IDMState *s = IDM(obj);
+
+    object_property_add_link(obj, IDM_MEMDEV_PROP, TYPE_MEMORY_BACKEND_SHARED,
+                             (Object **)&s->hostmem,
+                             qdev_prop_allow_set_link_before_realize,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
+
+    object_property_add_link(obj, IDM_SOCKET_PROP, TYPE_MULTI_SOCKET_BACKEND,
+                             (Object **)&s->master_socket,
+                             qdev_prop_allow_set_link_before_realize,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
+}
+
+static void idm_init_pci(Object *obj)
+{
+    IDMState *s = IDM_PCI(obj);
+
+    object_property_add_link(obj, IDM_MEMDEV_PROP, TYPE_MEMORY_BACKEND_SHARED,
+                             (Object **)&s->hostmem,
+                             qdev_prop_allow_set_link_before_realize,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
+
+    object_property_add_link(obj, IDM_SOCKET_PROP, TYPE_MULTI_SOCKET_BACKEND,
+                             (Object **)&s->master_socket,
+                             qdev_prop_allow_set_link_before_realize,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
+}
+
+static const TypeInfo idm_info = {
+    .name          = TYPE_IDM,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_init = idm_init,
+    .instance_size = sizeof(struct IDMState),
+    .class_init    = idm_class_init,
+    .class_size    = sizeof(struct IDMDeviceClass),
+};
+
+static const TypeInfo idm_info_pci = {
+    .name          = TYPE_IDM_PCI,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_init = idm_init_pci,
+    .instance_size = sizeof(struct IDMState),
+    .class_init    = idm_pci_class_init,
+    .class_size    = sizeof(struct IDMPCIDeviceClass),
+};
+
+static void idm_register_types(void)
+{
+    type_register_static(&idm_info);
+    type_register_static(&idm_info_pci);
+}
+
+type_init(idm_register_types)
diff --git a/include/hw/misc/idm.h b/include/hw/misc/idm.h
new file mode 100644
index 0000000..ef22a4e
--- /dev/null
+++ b/include/hw/misc/idm.h
@@ -0,0 +1,119 @@
+/*
+ * IDM Device
+ *
+ * Copyright (C) 2015 - Virtual Open Systems
+ *
+ * Author: Christian Pinto <c.pinto@virtualopensystems.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_MISC_IDM_H
+#define HW_MISC_IDM_H
+
+
+#include "hw/pci/pci.h"
+#include "qemu/multi-socket.h"
+#include "sysemu/hostmem-shared.h"
+
+typedef struct IDMState IDMState;
+
+#define DEBUG_IDM
+#ifdef DEBUG_IDM
+#define IDM_PRINTF(fmt, ...) \
+    do {printf("IDM - " fmt, ## __VA_ARGS__); fflush(stdout); } while (0)
+#else
+#define IDM_PRINTF(fmt, ...)
+#endif
+
+#define TYPE_IDM     "idm_ipi"
+#define TYPE_IDM_PCI "idm_ipi_pci"
+
+#define IDM_MEMDEV_PROP "memdev"
+#define IDM_SOCKET_PROP "socket"
+
+/*
+ * Currently using a fake PCI device ID
+ */
+#define PCI_VENDOR_ID_IDM   PCI_VENDOR_ID_REDHAT_QUMRANET
+#define PCI_DEVICE_ID_IDM   0x1111
+
+#define IDM(obj) OBJECT_CHECK(struct IDMState, (obj), TYPE_IDM)
+#define IDM_PCI(obj) OBJECT_CHECK(struct IDMState, (obj), TYPE_IDM_PCI)
+
+/*
+ * Size of the IO memory mapped region
+ * associated with IDM device registers
+ */
+#define IDM_SIZE 0x100
+
+/*
+ * Registers for IDM device
+ */
+#define ISR_REG  0x4
+#define KICK_REG 0x8
+#define BOOT_REG 0xC
+#define PBOOT_REG 0x10
+#define PBOOT_REG_RESET 0xDEAD
+
+#define SLAVE_REGISTER_CMD "SLAVE_REGISTER"
+#define MASTER_EVENTFD_CMD "MASTER_EVENTFD"
+#define SEND_MEM_FD_CMD    "send_fd"
+
+struct IDMEventfdChan {
+    EventNotifier in;
+    EventNotifier out;
+};
+
+struct IDMSlave {
+    uint32_t slave_id;
+    struct IDMEventfdChan eventfd;
+    IDMState *idm_state;
+    hwaddr boot_address;
+    MSClient *socket_client;
+};
+
+struct IDMState {
+    /*< private >*/
+    union {
+        PCIDevice pdev;
+        SysBusDevice sdev;
+    };
+
+    /*< public >*/
+    bool pci;
+    MemoryRegion iomem;
+    uint32_t num_slaves;
+    uint32_t registered_slaves;
+    bool master;
+    struct IDMSlave *master_eventfd;
+    struct IDMSlave *slaves;
+    MSBackend *master_socket;
+    qemu_irq irq;
+    HostMemoryBackendShared *hostmem;
+    /*registers*/
+    bool pboot_status;
+    uint32_t pboot_size;
+    uint32_t pboot_offset;
+
+    /*
+     * One bit is set every time an interrupt is received 0 for the master,
+     * slaves from 1 onwards
+     */
+    uint32_t int_status_reg;
+};
+
+struct IDMDeviceClass {
+    /*< private >*/
+    SysBusDeviceClass parent_class;
+    /*< public >*/
+};
+
+struct IDMPCIDeviceClass {
+    /*< private >*/
+    PCIDeviceClass parent_class;
+    /*< public >*/
+};
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 5/8] hw/arm: sysbus-fdt
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
                   ` (3 preceding siblings ...)
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 4/8] hw/misc: IDM Device Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 6/8] qemu: slave machine flag Christian Pinto
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jani.Kokkonen, tech, Claudio.Fontana, Christian Pinto

Added node creation for dynamically instantiated sysbus IDM device.
Support added for all ARM machines modeling dynamic sysbus devices instantiation.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
---
 hw/arm/sysbus-fdt.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 9d28797..e5f4321 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -28,6 +28,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/vfio/vfio-platform.h"
 #include "hw/vfio/vfio-calxeda-xgmac.h"
+#include "hw/misc/idm.h"
 #include "hw/arm/fdt.h"
 
 /*
@@ -123,9 +124,68 @@ fail_reg:
     return ret;
 }
 
+/**
+ * add_idm_fdt_node
+ *
+ * Generates a node with following properties:
+ * compatible string, regs, interrupts
+ */
+static int add_idm_fdt_node(SysBusDevice *sbdev, void *opaque)
+{
+    PlatformBusFDTData *data = opaque;
+    PlatformBusDevice *pbus = data->pbus;
+    const char idm_compat[] = "idm-ipi";
+    void *fdt = data->fdt;
+    const char *parent_node = data->pbus_node_name;
+    char *nodename;
+    int ret = -1;
+    uint32_t *irq_attr, *reg_attr;
+    uint64_t mmio_base, irq_number;
+
+    mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
+    nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+        TYPE_IDM, mmio_base);
+    qemu_fdt_add_subnode(fdt, nodename);
+
+    qemu_fdt_setprop(fdt, nodename, "compatible", idm_compat,
+        sizeof(idm_compat));
+
+    /**
+     * There is only one MMIO region defined for the IDM device
+     */
+    reg_attr = g_new(uint32_t, 2);
+    reg_attr[0] = cpu_to_be32(mmio_base);
+    reg_attr[1] = cpu_to_be32(IDM_SIZE);
+    ret = qemu_fdt_setprop(fdt, nodename, "reg", reg_attr,
+        2 * sizeof(uint32_t));
+    if (ret) {
+        error_report("could not set reg property of node %s", nodename);
+        goto fail;
+    }
+
+    irq_attr = g_new(uint32_t, 3);
+    irq_number = platform_bus_get_irqn(pbus, sbdev , 0)
+                     + data->irq_start;
+    irq_attr[0] = cpu_to_be32(GIC_FDT_IRQ_TYPE_SPI);
+    irq_attr[1] = cpu_to_be32(irq_number);
+    irq_attr[2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+    ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+        irq_attr, 3 * sizeof(uint32_t));
+    if (ret) {
+        error_report("could not set interrupts property of node %s",
+                nodename);
+    }
+    g_free(irq_attr);
+fail:
+    g_free(reg_attr);
+    g_free(nodename);
+    return ret;
+}
+
 /* list of supported dynamic sysbus devices */
 static const NodeCreationPair add_fdt_node_functions[] = {
     {TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node},
+    {TYPE_IDM, add_idm_fdt_node},
     {"", NULL}, /* last element */
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 6/8] qemu: slave machine flag
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
                   ` (4 preceding siblings ...)
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 5/8] hw/arm: sysbus-fdt Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 7/8] hw/arm: boot Christian Pinto
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jani.Kokkonen, tech, Claudio.Fontana, Christian Pinto

This patch adds a new machine flag, to configure qemu as a slave instance.

Usage

-machine -slave=[on|off] (default=off)

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
---
 hw/core/machine.c   | 27 +++++++++++++++++++++++++++
 include/hw/boards.h |  2 ++
 qemu-options.hx     |  5 ++++-
 util/qemu-config.c  |  5 +++++
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f4db340..b3e1e28 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -283,6 +283,20 @@ static bool machine_get_suppress_vmdesc(Object *obj, Error **errp)
     return ms->suppress_vmdesc;
 }
 
+static bool machine_get_slave(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->slave;
+}
+
+static void machine_set_slave(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->slave = value;
+}
+
 static int error_on_sysbus_device(SysBusDevice *sbdev, void *opaque)
 {
     error_report("Option '-device %s' cannot be handled by this machine",
@@ -335,6 +349,7 @@ static void machine_initfn(Object *obj)
     ms->kvm_shadow_mem = -1;
     ms->dump_guest_core = true;
     ms->mem_merge = true;
+    ms->slave = false;
 
     object_property_add_str(obj, "accel",
                             machine_get_accel, machine_set_accel, NULL);
@@ -437,6 +452,13 @@ static void machine_initfn(Object *obj)
     object_property_set_description(obj, "suppress-vmdesc",
                                     "Set on to disable self-describing migration",
                                     NULL);
+    object_property_add_bool(obj, "slave",
+                             machine_get_slave,
+                             machine_set_slave,
+                             NULL);
+    object_property_set_description(obj, "slave",
+                                    "Enables a slave (remote) machine instance",
+                                    NULL);
 
     /* Register notifier when init is done for sysbus sanity checks */
     ms->sysbus_notifier.notify = machine_init_notify;
@@ -497,6 +519,11 @@ bool machine_mem_merge(MachineState *machine)
     return machine->mem_merge;
 }
 
+bool machine_slave(MachineState *machine)
+{
+    return machine->slave;
+}
+
 static const TypeInfo machine_info = {
     .name = TYPE_MACHINE,
     .parent = TYPE_OBJECT,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 3e9a92c..523cfc2 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -40,6 +40,7 @@ int machine_kvm_shadow_mem(MachineState *machine);
 int machine_phandle_start(MachineState *machine);
 bool machine_dump_guest_core(MachineState *machine);
 bool machine_mem_merge(MachineState *machine);
+bool machine_slave(MachineState *machine);
 
 /**
  * MachineClass:
@@ -120,6 +121,7 @@ struct MachineState {
     char *firmware;
     bool iommu;
     bool suppress_vmdesc;
+    bool slave;
 
     ram_addr_t ram_size;
     ram_addr_t maxram_size;
diff --git a/qemu-options.hx b/qemu-options.hx
index 328404c..039d01c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -41,7 +41,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                igd-passthru=on|off controls IGD GFX passthrough support (default=off)\n"
     "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
     "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
-    "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n",
+    "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
+    "                slave=on|off enables a slave (remote) machine instance (default=off)",
     QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -80,6 +81,8 @@ execution of AES cryptographic functions.  The default is on.
 Enables or disables DEA key wrapping support on s390-ccw hosts. This feature
 controls whether DEA wrapping keys will be created to allow
 execution of DEA cryptographic functions.  The default is on.
+@item slave=on|off
+Enables a slave (remote) machine instance
 @end table
 ETEXI
 
diff --git a/util/qemu-config.c b/util/qemu-config.c
index 5fcfd0e..696408d 100644
--- a/util/qemu-config.c
+++ b/util/qemu-config.c
@@ -219,6 +219,11 @@ static QemuOptsList machine_opts = {
             .name = "suppress-vmdesc",
             .type = QEMU_OPT_BOOL,
             .help = "Set on to disable self-describing migration",
+        },{
+            .name = "slave",
+            .type = QEMU_OPT_BOOL,
+            .help = "Indicates the machine is a slave instance "
+                    "(e.g. remoteproc)",
         },
         { /* End of list */ }
     }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 7/8] hw/arm: boot
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
                   ` (5 preceding siblings ...)
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 6/8] qemu: slave machine flag Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 8/8] qemu: numa Christian Pinto
  2015-10-01 16:26 ` [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Peter Crosthwaite
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jani.Kokkonen, tech, Claudio.Fontana, Christian Pinto

Modify the boot process of an ARM machine in order to check
whether it is a slave, by checking the slave machine flag.

When the slave flag is on, no kernel, dtb or initrd are loaded into memory.
The boot address of each core is set to the start address of the RAM,
that depends on the machine model executed.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
---
 hw/arm/boot.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index bef451b..ee0c4a1 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -590,6 +590,19 @@ static void arm_load_kernel_notify(Notifier *notifier, void *data)
     /* Load the kernel.  */
     if (!info->kernel_filename || info->firmware_loaded) {
 
+        if (!info->kernel_filename && machine_slave(current_machine)) {
+            /* If a machine is booted as a slave instance there is no need to
+             * provide the DTB blob or kernel image, that will instead
+             * be copied into memory later by a master instance.
+             * The boot address is set to be at the beginning of the RAM.
+             */
+            info->entry = info->loader_start;
+            CPU_FOREACH(cs) {
+                ARM_CPU(cs)->env.boot_info = info;
+            }
+            return;
+        }
+
         if (have_dtb(info)) {
             /* If we have a device tree blob, but no kernel to supply it to (or
              * the kernel is supposed to be loaded by the bootloader), copy the
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [RFC PATCH 8/8] qemu: numa
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
                   ` (6 preceding siblings ...)
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 7/8] hw/arm: boot Christian Pinto
@ 2015-09-29 13:57 ` Christian Pinto
  2015-10-01 16:26 ` [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Peter Crosthwaite
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-09-29 13:57 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jani.Kokkonen, tech, Claudio.Fontana, Christian Pinto

This patch modifies the behavior of
memory_region_allocate_system_memory, when the new shared memory backend
is used from a slave qemu instance.
In such case there is not yet a valid pointer for the memory region pointed
by the backend (it will be innitilized later when received from the master.)
and vmstate_register would fail.

The patch skips the call to vmstate_register in case of slave shared memory
backend, that will be performed later after the actual memory pointer is
available.

Signed-off-by: Christian Pinto <c.pinto@virtualopensystems.com>
---
 numa.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/numa.c b/numa.c
index e9b18f5..b39a892 100644
--- a/numa.c
+++ b/numa.c
@@ -33,6 +33,7 @@
 #include "qapi/dealloc-visitor.h"
 #include "hw/boards.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/hostmem-shared.h"
 #include "qmp-commands.h"
 #include "hw/mem/pc-dimm.h"
 #include "qemu/option.h"
@@ -442,6 +443,7 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 {
     uint64_t addr = 0;
     int i;
+    bool vmstate_register = true;
 
     if (nb_numa_nodes == 0 || !have_memdevs) {
         allocate_system_memory_nonnuma(mr, owner, name, ram_size);
@@ -453,9 +455,18 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
         Error *local_err = NULL;
         uint64_t size = numa_info[i].node_mem;
         HostMemoryBackend *backend = numa_info[i].node_memdev;
+
         if (!backend) {
             continue;
         }
+
+        if (IS_MEMORY_BACKEND_SHARED(backend)) {
+            HostMemoryBackendShared *shm = MEMORY_BACKEND_SHARED(backend);
+            if (!shm->master) {
+                vmstate_register = false;
+            }
+        }
+
         MemoryRegion *seg = host_memory_backend_get_memory(backend, &local_err);
         if (local_err) {
             error_report_err(local_err);
@@ -471,7 +482,11 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
         }
 
         memory_region_add_subregion(mr, addr, seg);
-        vmstate_register_ram_global(seg);
+
+        if (vmstate_register) {
+            vmstate_register_ram_global(seg);
+        }
+
         addr += size;
     }
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
                   ` (7 preceding siblings ...)
  2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 8/8] qemu: numa Christian Pinto
@ 2015-10-01 16:26 ` Peter Crosthwaite
  2015-10-05 15:50   ` Christian Pinto
  8 siblings, 1 reply; 19+ messages in thread
From: Peter Crosthwaite @ 2015-10-01 16:26 UTC (permalink / raw)
  To: Christian Pinto, mar.krzeminski, Peter Maydell, Edgar Iglesias
  Cc: Jani.Kokkonen, tech, Claudio.Fontana,
	qemu-devel@nongnu.org Developers

On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
<c.pinto@virtualopensystems.com> wrote:
> Hi all,
>
> This RFC patch-series introduces the set of changes enabling the
> architectural elements to model the architecture presented in a previous RFC
> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>
> To recap the goal of such RFC:
>
> The idea is to enhance the current architecture of QEMU to enable the modeling
> of a state of-the-art SoC with an AMP processing style, where different
> processing units share the same system memory and communicate through shared
> memory and inter-processor interrupts.

This might have a lot in common with a similar inter-qemu
communication solution effort at Xilinx. Edgar talks about it at KVM
forum:

https://www.youtube.com/watch?v=L5zG5Aukfek

Around 18:30 mark. I think it might be lower level that your proposal,
remote-port is designed to export the raw hardware interfaces (busses
and pins) between QEMU and some other system, another QEMU being the
common use cases.

> An example is a multi-core ARM CPU
> working alongside with two Cortex-M micro controllers.
>

Marcin is doing something with A9+M3. It sounds like he already has a
lot working (latest emails were on some finer points). What is the
board/SoC in question here (if you are able to share)?

> From the user point of view there is usually an operating system booting on
> the Master processor (e.g. Linux) at platform startup, while the other
> processors are used to offload the Master one from some computation or to deal
> with real-time interfaces.

I feel like this is architecting hardware based on common software use
cases, rather than directly modelling the SoC in question. Can we
model the hardware (e.g. the devices that are used for rpmesg and IPIs
etc.) as regular devices, as it is in-SoC? That means AMP is just
another guest?

> It is the Master OS that triggers the boot of the
> Slave processors, and provides them also the binary code to execute (e.g.
> RTOS, binary firmware) by placing it into a pre-defined memory area that is
> accessible to the Slaves. Usually the memory for the Slaves is carved out from
> the Master OS during boot. Once a Slave is booted the two processors can
> communicate through queues in shared memory and inter-processor interrupts
> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
> control (boot/shutdown) of Slave processors, and also to establish a
> communication channel based on virtio queues.
>
> Currently, QEMU is not able to model such an architecture mainly because only
> a single processor can be emulated at one time,

SMP does work already. MTTCG will remove the one-run-at-a-time
limitation. Multi-arch will allow you to mix multiple CPU
architectures (e.g. PPC + ARM in same QEMU). But multiple
heterogeneous ARMs should already just work, and there is already an
in-tree precedent with the xlnx-zynqmp SoC. That SoC has 4xA53 and
2xR5 (all ARM).

Multiple system address spaces and CPUs have different views of the
address space is another common snag on this effort, and is discussed
on a recent thread between myself and Marcin.

> and the OS binary image needs
> to be placed in memory at model startup.
>

I don't see what this limitation is exactly. Can you explain more? I
do see a need to work on the ARM bootloader for AMP flows, it is a
pure SMP bootloader than assumes total control.

Can this effort be a bootloader overhaul? Two things:

1: The bootloader needs to repeatable
2: The bootloaders need to be targetable (to certain CPUs or clusters)

> This patch series adds a set of modules and introduces minimal changes to the
> current QEMU code-base to implement what described above, with master and slave
> implemented as two different instances of QEMU. The aim of this work is to
> enable application and runtime programmers to test their AMP applications, or
> their new inter-SoC communtication protocol.
>
> The main changes are depicted in the following diagram and involve:
>     - A new multi-client socket implementation that allows multiple instances of
>       QEMU to attach to the same socket, with only one acting as a master.
>     - A new memory backend, the shared memory backend, based on
>       the file memory backend. Such new backend enables, on the master side,
>       to allocate the whole memory as shareable (e.g. /dev/shm, or hugetlbfs).
>       On the slave side it enables the startup of QEMU without any main memory
>       allocated. The the slave goes in a waiting state, the same used in the
>       case of an incoming migration, and a callback is registered on a
>       multi-client socket shared with the master.
>       The waiting state ends when the master sends to the slave the file
>       descriptor and offset to mmap and use as memory.

This is useful in it's own right and came up in the Xilinx implementation.

>     - A new inter-processor interrupt hardware distribution module, that is used
>       also to trigger the boot of slave processors. Such module uses a pair of
>       eventfd for each master-slave couple to trigger interrupts between the
>       instances. No slave-to-slave interrupts are envisioned by the current
>       implementation.

Wouldn't that just be a software interrupt in the local QEMU instance?

> The multi client-socket is used for the master to trigger
>       the boot of a slave, and also for each master-slave couple to exchancge the
>       eventd file descriptors. The IDM device can be instantiated either as a
>       PCI or sysbus device.
>

So if everything is is one QEMU, IPIs can be implemented with just a
regular interrupt controller (which has a software set).

>
>                            Memory
>                            (e.g. hugetlbfs)
>
> +------------------+       +--------------+            +------------------+
> |                  |       |              |            |                  |
> |   QEMU MASTER    |       |   Master     |            |   QEMU SLAVE     |
> |                  |       |   Memory     |            |                  |
> | +------+  +------+-+     |              |          +-+------+  +------+ |
> | |      |  |SHMEM   |     |              |          |SHMEM   |  |      | |
> | | VCPU |  |Backend +----->              |    +----->Backend |  | VCPU | |
> | |      |  |        |     |              |    | +--->        |  |      | |
> | +--^---+  +------+-+     |              |    | |   +-+------+  +--^---+ |
> |    |             |       |              |    | |     |            |     |
> |    +--+          |       |              |    | |     |        +---+     |
> |       | IRQ      |       | +----------+ |    | |     |    IRQ |         |
> |       |          |       | |          | |    | |     |        |         |
> |  +----+----+     |       | | Slave    <------+ |     |   +----+---+     |
> +--+  IDM    +-----+       | | Memory   | |      |     +---+ IDM    +-----+
>    +-^----^--+             | |          | |      |         +-^---^--+
>      |    |                | +----------+ |      |           |   |
>      |    |                +--------------+      |           |   |
>      |    |                                      |           |   |
>      |    +--------------------------------------+-----------+   |
>      |   UNIX Domain Socket(send mem fd + offset, trigger boot)  |
>      |                                                           |
>      +-----------------------------------------------------------+
>                               eventfd
>

So the slave can only see a subset of the masters memory? Is the
masters memory just the full system memory and the master is doing
IOMMU setup for the slave pre-boot? Or is it a hard feature of the
physical SoC?

>
> The whole code can be checked out from:
> https://git.virtualopensystems.com/dev/qemu-het.git
> branch:
> qemu-het-rfc-v1
>
> Patches apply to the current QEMU master branch
>
> =========
> Demo
> =========
>
> This patch series comes in the form of a demo to better understand how the
> changes introduced can be exploited.
> At the current status the demo can be executed using an ARM target for both
> master and slave.
>
> The demo shows how a master QEMU instance carves out the memory for a slave,
> copies inside linux kernel image and device tree blob and finally triggers the
> boot.
>

These processes must have underlying hardware implementation, is the
master using a system controller to implement the slave boot? (setting
reset and entry points via registers?). How hard are they to model as
regular devs?

>
> How to reproduce the demo:
>
> In order to reproduce the demo a couple more extra elements need to be
> downloaded and compiled.
>
> Binary loader
> Loads the slave firmware (kernel) binary into memory and triggers the boot
> https://git.virtualopensystems.com/dev/qemu-het-tools.git
> branch:
> load-bin-boot
> To compile: just type "make"
>
> Slave kernel
> Compile a linux kernel image (zImage) for the virt machine model.
>
> IDM test kernel module
> Needed to trigger the boot of a slave
> https://git.virtualopensystems.com/dev/qemu-het-tools.git
> branch:
> IDM-kernel-module
> To compile: KDIR=kernel_path ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make
>
> Slave DTB
> https://git.virtualopensystems.com/dev/qemu-het-tools.git
> branch:
> slave-dtb
>
> Copy binary loader, IDM kernel module, zImage and dtb inside the disk
> image or ramdisk of the master instance.
>
> Run the demo:
>
> run the master instance
>
> ./arm-softmmu/qemu-system-arm \
>     -kernel zImage \
>     -M virt -cpu cortex-a15 \
>     -drive if=none,file=disk.img,cache=writeback,id=foo1 \
>     -device virtio-blk-device,drive=foo1 \
>     -object multi-socket-backend,id=foo,listen,path=ms_socket \
>     -object memory-backend-shared,id=mem,size=1G,mem-path=/mnt/hugetlbfs,chardev=foo,master=on,prealloc=on  \
>     -device idm_ipi,master=true,memdev=mem,socket=foo \
>     -numa node,memdev=mem -m 1G \
>     -append "root=/dev/vda rw console=ttyAMA0 mem=512M memmap=512M$0x60000000" \
>     -nographic
>
> run the slave instance
>
> ./arm-softmmu/qemu-system-arm\
>     -M virt -cpu cortex-a15 -machine slave=on \
>     -drive if=none,file=disk.img,cache=writeback,id=foo1 \
>     -device virtio-blk-device,drive=foo1 \
>     -object multi-socket-backend,id=foo,path=ms_socket \
>     -object memory-backend-shared,id=mem,size=512M,mem-path=/mnt/hugetlbfs,chardev=foo,master=off \
>     -device idm_ipi,master=false,memdev=mem,socket=foo \
>     -incoming "shared:mem" -numa node,memdev=mem -m 512M \
>     -nographic
>
>
> For simplicity, use a disk image for the slave instead of a ramdisk.
>
> As visible from the kernel boot arguments, the master is booted with mem=512
> so that one half of the whole memory allocated is not used by the master and
> reserved for the slave. Such memory starts for the virt platform from
> address 0x60000000.
>
> Once the master is booted the image of the kernel and DTB can be copied in the
> memory carved out for the slave.
>
> In the maser console
>
> probe the IDM kernel module:
>
> $ insmod idm_test_mod.ko
>
> run the application that copies the binaries into memory and triggers the boot:
>
> $ ./load_bin_app 1 ./zImage ./slave.dtb
>
>
> On the slave console the linux kernel boot should be visible.
>
> The present demo is intended only as a demonstration to see the patch-set at
> work. In the near future, boot triggering, memory carveout and binary copy might
> be implemented in a remoteproc driver coupled with a RPMSG driver for
> communication between master and slave instance.
>

So are these drivers the same ones as run on the real hardware? is
there value in the fact that the real IPI mechanisms are replaced with
virt ones?

Regards,
Peter

>
> This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
>
> Baptiste Reynal (3):
>   backend: multi-socket
>   backend: shared memory backend
>   migration: add shared migration type
>
> Christian Pinto (5):
>   hw/misc: IDM Device
>   hw/arm: sysbus-fdt
>   qemu: slave machine flag
>   hw/arm: boot
>   qemu: numa
>
>  backends/Makefile.objs             |   4 +-
>  backends/hostmem-shared.c          | 203 ++++++++++++++++++
>  backends/multi-socket.c            | 353 +++++++++++++++++++++++++++++++
>  default-configs/arm-softmmu.mak    |   1 +
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  hw/arm/boot.c                      |  13 ++
>  hw/arm/sysbus-fdt.c                |  60 ++++++
>  hw/core/machine.c                  |  27 +++
>  hw/misc/Makefile.objs              |   2 +
>  hw/misc/idm.c                      | 416 +++++++++++++++++++++++++++++++++++++
>  include/hw/boards.h                |   2 +
>  include/hw/misc/idm.h              | 119 +++++++++++
>  include/migration/migration.h      |   2 +
>  include/qemu/multi-socket.h        | 124 +++++++++++
>  include/sysemu/hostmem-shared.h    |  61 ++++++
>  migration/Makefile.objs            |   2 +-
>  migration/migration.c              |   2 +
>  migration/shared.c                 |  32 +++
>  numa.c                             |  17 +-
>  qemu-options.hx                    |   5 +-
>  util/qemu-config.c                 |   5 +
>  22 files changed, 1448 insertions(+), 4 deletions(-)
>  create mode 100644 backends/hostmem-shared.c
>  create mode 100644 backends/multi-socket.c
>  create mode 100644 hw/misc/idm.c
>  create mode 100644 include/hw/misc/idm.h
>  create mode 100644 include/qemu/multi-socket.h
>  create mode 100644 include/sysemu/hostmem-shared.h
>  create mode 100644 migration/shared.c
>
> --
> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-01 16:26 ` [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Peter Crosthwaite
@ 2015-10-05 15:50   ` Christian Pinto
  2015-10-07 15:48     ` Peter Crosthwaite
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Pinto @ 2015-10-05 15:50 UTC (permalink / raw)
  To: Peter Crosthwaite, mar.krzeminski, Peter Maydell, Edgar Iglesias
  Cc: Jani.Kokkonen, tech, Claudio.Fontana,
	qemu-devel@nongnu.org Developers

Hello Peter,

thanks for your comments

On 01/10/2015 18:26, Peter Crosthwaite wrote:
> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
> <c.pinto@virtualopensystems.com>  wrote:
>> Hi all,
>>
>> This RFC patch-series introduces the set of changes enabling the
>> architectural elements to model the architecture presented in a previous RFC
>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>>
>> To recap the goal of such RFC:
>>
>> The idea is to enhance the current architecture of QEMU to enable the modeling
>> of a state of-the-art SoC with an AMP processing style, where different
>> processing units share the same system memory and communicate through shared
>> memory and inter-processor interrupts.
> This might have a lot in common with a similar inter-qemu
> communication solution effort at Xilinx. Edgar talks about it at KVM
> forum:
>
> https://www.youtube.com/watch?v=L5zG5Aukfek
>
> Around 18:30 mark. I think it might be lower level that your proposal,
> remote-port is designed to export the raw hardware interfaces (busses
> and pins) between QEMU and some other system, another QEMU being the
> common use cases.
Thanks for pointing this out. Indeed what presented by Edgar has a lot
of similarities with our proposal, but is targeting a different scenario
where low-level modeling of the various hardware components is taken
into account.
The goal of my proposal is on the other hand to enable a set of tools
for high level early prototyping of systems with a heterogeneous
set of cores, so to model a platform that does not exist in reality but that
the user wants to experiment with.
As an example I can envision a programming model researcher willing to 
explore an
heterogeneous system based on an X86 and a multi-core ARM accelerator 
sharing memory,
to build a new programming paradigm on top of it. Such user would not 
need the specific details
of the hardware nor all the various devices available in a real SoC, but 
only an abstract model
encapsulating the main features needed for his research.

So to link also to your next comment there is no actual SoC/hardware 
targeted by this
work.
>> An example is a multi-core ARM CPU
>> working alongside with two Cortex-M micro controllers.
>>
> Marcin is doing something with A9+M3. It sounds like he already has a
> lot working (latest emails were on some finer points). What is the
> board/SoC in question here (if you are able to share)?
>
>>  From the user point of view there is usually an operating system booting on
>> the Master processor (e.g. Linux) at platform startup, while the other
>> processors are used to offload the Master one from some computation or to deal
>> with real-time interfaces.
> I feel like this is architecting hardware based on common software use
> cases, rather than directly modelling the SoC in question. Can we
> model the hardware (e.g. the devices that are used for rpmesg and IPIs
> etc.) as regular devices, as it is in-SoC? That means AMP is just
> another guest?
This is a set of extensions focusing more on the communication channel 
between the processors
rather than a full SoC model. With this patch series each of the AMP 
processors is a different "guest".
>> It is the Master OS that triggers the boot of the
>> Slave processors, and provides them also the binary code to execute (e.g.
>> RTOS, binary firmware) by placing it into a pre-defined memory area that is
>> accessible to the Slaves. Usually the memory for the Slaves is carved out from
>> the Master OS during boot. Once a Slave is booted the two processors can
>> communicate through queues in shared memory and inter-processor interrupts
>> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
>> control (boot/shutdown) of Slave processors, and also to establish a
>> communication channel based on virtio queues.
>>
>> Currently, QEMU is not able to model such an architecture mainly because only
>> a single processor can be emulated at one time,
> SMP does work already. MTTCG will remove the one-run-at-a-time
> limitation. Multi-arch will allow you to mix multiple CPU
> architectures (e.g. PPC + ARM in same QEMU). But multiple
> heterogeneous ARMs should already just work, and there is already an
> in-tree precedent with the xlnx-zynqmp SoC. That SoC has 4xA53 and
> 2xR5 (all ARM).
Since Multi-arch is not yet available, with this proposal it is possible to
experiment with heterogeneous processors at high level of abstraction,
even beyond the ARM + ARM (e.g. X86 + ARM), exploiting the off-the-shelf
QEMU.

One thing I want to add is that all the solutions mentioned in this 
discussion Multi-arch,
Xilinx's patches, and our proposal could coexist from the code point of 
view, and none
would prevent the others from being used.
> Multiple system address spaces and CPUs have different views of the
> address space is another common snag on this effort, and is discussed
> on a recent thread between myself and Marcin.
Yes I have seen the discussion, but it was mostly dealing with one single
QEMU instance modeling all the cores. Here the different address spaces
are enforced by multiple QEMU instances.
>> and the OS binary image needs
>> to be placed in memory at model startup.
>>
> I don't see what this limitation is exactly. Can you explain more? I
> do see a need to work on the ARM bootloader for AMP flows, it is a
> pure SMP bootloader than assumes total control.
the problem here was to me that when we launch QEMU a binary needs to be 
provided and put in memory
in order to be executed. In this patch series the slave doesn't have a 
proper memory allocated when first launched.
The information about memory (fd + offset for mmap) is sent only later 
when the boot is triggered. This is also
safe since the slave will be waiting in the incoming state, and thus no 
corruption or errors can happen before the
boot is triggered.
> Can this effort be a bootloader overhaul? Two things:
>
> 1: The bootloader needs to repeatable
> 2: The bootloaders need to be targetable (to certain CPUs or clusters)
Well in this series the bootloader for the master is different from the 
one for the slave. In my idea the master,
besides the firmware/kernel image, will copy also a bootloader for the 
slave.
>> This patch series adds a set of modules and introduces minimal changes to the
>> current QEMU code-base to implement what described above, with master and slave
>> implemented as two different instances of QEMU. The aim of this work is to
>> enable application and runtime programmers to test their AMP applications, or
>> their new inter-SoC communtication protocol.
>>
>> The main changes are depicted in the following diagram and involve:
>>      - A new multi-client socket implementation that allows multiple instances of
>>        QEMU to attach to the same socket, with only one acting as a master.
>>      - A new memory backend, the shared memory backend, based on
>>        the file memory backend. Such new backend enables, on the master side,
>>        to allocate the whole memory as shareable (e.g. /dev/shm, or hugetlbfs).
>>        On the slave side it enables the startup of QEMU without any main memory
>>        allocated. The the slave goes in a waiting state, the same used in the
>>        case of an incoming migration, and a callback is registered on a
>>        multi-client socket shared with the master.
>>        The waiting state ends when the master sends to the slave the file
>>        descriptor and offset to mmap and use as memory.
> This is useful in it's own right and came up in the Xilinx implementation.
It is also mentioned in the video you are pointing, where the Microblaze 
cores are instantiated as foreign QEMU instances.
Is the code publicly available? There was a question about that in the 
video but I couldn't catch the answer.
>>      - A new inter-processor interrupt hardware distribution module, that is used
>>        also to trigger the boot of slave processors. Such module uses a pair of
>>        eventfd for each master-slave couple to trigger interrupts between the
>>        instances. No slave-to-slave interrupts are envisioned by the current
>>        implementation.
> Wouldn't that just be a software interrupt in the local QEMU instance?
Since in this proposal there will be multiple instances of QEMU running 
at the same time, eventfd
are used to signal the event (interrupt) among the different processes. 
So writing to a register of the IDM
will raise an interrupt to a remote QEMU instance using eventfd. Did 
this answer your question?
>> The multi client-socket is used for the master to trigger
>>        the boot of a slave, and also for each master-slave couple to exchancge the
>>        eventd file descriptors. The IDM device can be instantiated either as a
>>        PCI or sysbus device.
>>
> So if everything is is one QEMU, IPIs can be implemented with just a
> regular interrupt controller (which has a software set).
As said there are multiple instances of QEMU running at the same time, 
and each of them will see the IDM in their memory map.
Even if the IDM instances will be physically different, because of the 
multiple processes, all together will act as a single block (e.g., a 
light version of a mailbox).
>>                             Memory
>>                             (e.g. hugetlbfs)
>>
>> +------------------+       +--------------+            +------------------+
>> |                  |       |              |            |                  |
>> |   QEMU MASTER    |       |   Master     |            |   QEMU SLAVE     |
>> |                  |       |   Memory     |            |                  |
>> | +------+  +------+-+     |              |          +-+------+  +------+ |
>> | |      |  |SHMEM   |     |              |          |SHMEM   |  |      | |
>> | | VCPU |  |Backend +----->              |    +----->Backend |  | VCPU | |
>> | |      |  |        |     |              |    | +--->        |  |      | |
>> | +--^---+  +------+-+     |              |    | |   +-+------+  +--^---+ |
>> |    |             |       |              |    | |     |            |     |
>> |    +--+          |       |              |    | |     |        +---+     |
>> |       | IRQ      |       | +----------+ |    | |     |    IRQ |         |
>> |       |          |       | |          | |    | |     |        |         |
>> |  +----+----+     |       | | Slave    <------+ |     |   +----+---+     |
>> +--+  IDM    +-----+       | | Memory   | |      |     +---+ IDM    +-----+
>>     +-^----^--+             | |          | |      |         +-^---^--+
>>       |    |                | +----------+ |      |           |   |
>>       |    |                +--------------+      |           |   |
>>       |    |                                      |           |   |
>>       |    +--------------------------------------+-----------+   |
>>       |   UNIX Domain Socket(send mem fd + offset, trigger boot)  |
>>       |                                                           |
>>       +-----------------------------------------------------------+
>>                                eventfd
>>
> So the slave can only see a subset of the masters memory? Is the
> masters memory just the full system memory and the master is doing
> IOMMU setup for the slave pre-boot? Or is it a hard feature of the
> physical SoC?
Yes slaves can only see the memory that has been reserved for them. This 
is ensured by carving out
the memory from the master kernel and providing the offset to such 
memory to the slave. Each slave
will have its own memory map, and see the memory at the address defined 
in the machine model.
There is no IOMMU modeled, but it is neither a hard feature since 
decided at run-time.
>> The whole code can be checked out from:
>> https://git.virtualopensystems.com/dev/qemu-het.git
>> branch:
>> qemu-het-rfc-v1
>>
>> Patches apply to the current QEMU master branch
>>
>> =========
>> Demo
>> =========
>>
>> This patch series comes in the form of a demo to better understand how the
>> changes introduced can be exploited.
>> At the current status the demo can be executed using an ARM target for both
>> master and slave.
>>
>> The demo shows how a master QEMU instance carves out the memory for a slave,
>> copies inside linux kernel image and device tree blob and finally triggers the
>> boot.
>>
> These processes must have underlying hardware implementation, is the
> master using a system controller to implement the slave boot? (setting
> reset and entry points via registers?). How hard are they to model as
> regular devs?
>
In this series the system controller is the IDM device, that through a 
set of registers makes the master in
"control" each of the slaves. The IDM device is already seen as a 
regular device by each of the QEMU instances
involved.
>> How to reproduce the demo:
>>
>> In order to reproduce the demo a couple more extra elements need to be
>> downloaded and compiled.
>>
>> Binary loader
>> Loads the slave firmware (kernel) binary into memory and triggers the boot
>> https://git.virtualopensystems.com/dev/qemu-het-tools.git
>> branch:
>> load-bin-boot
>> To compile: just type "make"
>>
>> Slave kernel
>> Compile a linux kernel image (zImage) for the virt machine model.
>>
>> IDM test kernel module
>> Needed to trigger the boot of a slave
>> https://git.virtualopensystems.com/dev/qemu-het-tools.git
>> branch:
>> IDM-kernel-module
>> To compile: KDIR=kernel_path ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make
>>
>> Slave DTB
>> https://git.virtualopensystems.com/dev/qemu-het-tools.git
>> branch:
>> slave-dtb
>>
>> Copy binary loader, IDM kernel module, zImage and dtb inside the disk
>> image or ramdisk of the master instance.
>>
>> Run the demo:
>>
>> run the master instance
>>
>> ./arm-softmmu/qemu-system-arm \
>>      -kernel zImage \
>>      -M virt -cpu cortex-a15 \
>>      -drive if=none,file=disk.img,cache=writeback,id=foo1 \
>>      -device virtio-blk-device,drive=foo1 \
>>      -object multi-socket-backend,id=foo,listen,path=ms_socket \
>>      -object memory-backend-shared,id=mem,size=1G,mem-path=/mnt/hugetlbfs,chardev=foo,master=on,prealloc=on  \
>>      -device idm_ipi,master=true,memdev=mem,socket=foo \
>>      -numa node,memdev=mem -m 1G \
>>      -append "root=/dev/vda rw console=ttyAMA0 mem=512M memmap=512M$0x60000000" \
>>      -nographic
>>
>> run the slave instance
>>
>> ./arm-softmmu/qemu-system-arm\
>>      -M virt -cpu cortex-a15 -machine slave=on \
>>      -drive if=none,file=disk.img,cache=writeback,id=foo1 \
>>      -device virtio-blk-device,drive=foo1 \
>>      -object multi-socket-backend,id=foo,path=ms_socket \
>>      -object memory-backend-shared,id=mem,size=512M,mem-path=/mnt/hugetlbfs,chardev=foo,master=off \
>>      -device idm_ipi,master=false,memdev=mem,socket=foo \
>>      -incoming "shared:mem" -numa node,memdev=mem -m 512M \
>>      -nographic
>>
>>
>> For simplicity, use a disk image for the slave instead of a ramdisk.
>>
>> As visible from the kernel boot arguments, the master is booted with mem=512
>> so that one half of the whole memory allocated is not used by the master and
>> reserved for the slave. Such memory starts for the virt platform from
>> address 0x60000000.
>>
>> Once the master is booted the image of the kernel and DTB can be copied in the
>> memory carved out for the slave.
>>
>> In the maser console
>>
>> probe the IDM kernel module:
>>
>> $ insmod idm_test_mod.ko
>>
>> run the application that copies the binaries into memory and triggers the boot:
>>
>> $ ./load_bin_app 1 ./zImage ./slave.dtb
>>
>>
>> On the slave console the linux kernel boot should be visible.
>>
>> The present demo is intended only as a demonstration to see the patch-set at
>> work. In the near future, boot triggering, memory carveout and binary copy might
>> be implemented in a remoteproc driver coupled with a RPMSG driver for
>> communication between master and slave instance.
>>
> So are these drivers the same ones as run on the real hardware? is
> there value in the fact that the real IPI mechanisms are replaced with
> virt ones?
As for the first question, since there is no specific target hardware 
even the drivers
are generic and thus not meant to run on a real SoC. The drivers shown 
for this demo
are an example of how the patch series could be used, and do not prevent 
the user to
implement its own drivers based on its own communication protocol.

Thanks,

Christian
> Regards,
> Peter
>
>> This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
>>
>> Baptiste Reynal (3):
>>    backend: multi-socket
>>    backend: shared memory backend
>>    migration: add shared migration type
>>
>> Christian Pinto (5):
>>    hw/misc: IDM Device
>>    hw/arm: sysbus-fdt
>>    qemu: slave machine flag
>>    hw/arm: boot
>>    qemu: numa
>>
>>   backends/Makefile.objs             |   4 +-
>>   backends/hostmem-shared.c          | 203 ++++++++++++++++++
>>   backends/multi-socket.c            | 353 +++++++++++++++++++++++++++++++
>>   default-configs/arm-softmmu.mak    |   1 +
>>   default-configs/i386-softmmu.mak   |   1 +
>>   default-configs/x86_64-softmmu.mak |   1 +
>>   hw/arm/boot.c                      |  13 ++
>>   hw/arm/sysbus-fdt.c                |  60 ++++++
>>   hw/core/machine.c                  |  27 +++
>>   hw/misc/Makefile.objs              |   2 +
>>   hw/misc/idm.c                      | 416 +++++++++++++++++++++++++++++++++++++
>>   include/hw/boards.h                |   2 +
>>   include/hw/misc/idm.h              | 119 +++++++++++
>>   include/migration/migration.h      |   2 +
>>   include/qemu/multi-socket.h        | 124 +++++++++++
>>   include/sysemu/hostmem-shared.h    |  61 ++++++
>>   migration/Makefile.objs            |   2 +-
>>   migration/migration.c              |   2 +
>>   migration/shared.c                 |  32 +++
>>   numa.c                             |  17 +-
>>   qemu-options.hx                    |   5 +-
>>   util/qemu-config.c                 |   5 +
>>   22 files changed, 1448 insertions(+), 4 deletions(-)
>>   create mode 100644 backends/hostmem-shared.c
>>   create mode 100644 backends/multi-socket.c
>>   create mode 100644 hw/misc/idm.c
>>   create mode 100644 include/hw/misc/idm.h
>>   create mode 100644 include/qemu/multi-socket.h
>>   create mode 100644 include/sysemu/hostmem-shared.h
>>   create mode 100644 migration/shared.c
>>
>> --
>> 1.9.1
>>
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-05 15:50   ` Christian Pinto
@ 2015-10-07 15:48     ` Peter Crosthwaite
  2015-10-22  9:21       ` Christian Pinto
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Crosthwaite @ 2015-10-07 15:48 UTC (permalink / raw)
  To: Christian Pinto, mst
  Cc: Edgar Iglesias, Peter Maydell, Claudio.Fontana,
	qemu-devel@nongnu.org Developers, Jani.Kokkonen, tech,
	mar.krzeminski

On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
<c.pinto@virtualopensystems.com> wrote:
> Hello Peter,
>
> thanks for your comments
>
> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>
>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>> <c.pinto@virtualopensystems.com>  wrote:
>>>
>>> Hi all,
>>>
>>> This RFC patch-series introduces the set of changes enabling the
>>> architectural elements to model the architecture presented in a previous
>>> RFC
>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>>>
>>> To recap the goal of such RFC:
>>>
>>> The idea is to enhance the current architecture of QEMU to enable the
>>> modeling
>>> of a state of-the-art SoC with an AMP processing style, where different
>>> processing units share the same system memory and communicate through
>>> shared
>>> memory and inter-processor interrupts.
>>
>> This might have a lot in common with a similar inter-qemu
>> communication solution effort at Xilinx. Edgar talks about it at KVM
>> forum:
>>
>> https://www.youtube.com/watch?v=L5zG5Aukfek
>>
>> Around 18:30 mark. I think it might be lower level that your proposal,
>> remote-port is designed to export the raw hardware interfaces (busses
>> and pins) between QEMU and some other system, another QEMU being the
>> common use cases.
>
> Thanks for pointing this out. Indeed what presented by Edgar has a lot
> of similarities with our proposal, but is targeting a different scenario
> where low-level modeling of the various hardware components is taken
> into account.
> The goal of my proposal is on the other hand to enable a set of tools
> for high level early prototyping of systems with a heterogeneous
> set of cores, so to model a platform that does not exist in reality but that
> the user wants to experiment with.
> As an example I can envision a programming model researcher willing to
> explore an
> heterogeneous system based on an X86 and a multi-core ARM accelerator
> sharing memory,
> to build a new programming paradigm on top of it. Such user would not need
> the specific details
> of the hardware nor all the various devices available in a real SoC, but
> only an abstract model
> encapsulating the main features needed for his research.
>
> So to link also to your next comment there is no actual SoC/hardware
> targeted by this
> work.
>>>
>>> An example is a multi-core ARM CPU
>>> working alongside with two Cortex-M micro controllers.
>>>
>> Marcin is doing something with A9+M3. It sounds like he already has a
>> lot working (latest emails were on some finer points). What is the
>> board/SoC in question here (if you are able to share)?
>>
>>>  From the user point of view there is usually an operating system booting
>>> on
>>> the Master processor (e.g. Linux) at platform startup, while the other
>>> processors are used to offload the Master one from some computation or to
>>> deal
>>> with real-time interfaces.
>>
>> I feel like this is architecting hardware based on common software use
>> cases, rather than directly modelling the SoC in question. Can we
>> model the hardware (e.g. the devices that are used for rpmesg and IPIs
>> etc.) as regular devices, as it is in-SoC? That means AMP is just
>> another guest?
>
> This is a set of extensions focusing more on the communication channel
> between the processors
> rather than a full SoC model. With this patch series each of the AMP
> processors is a different "guest".

Ok. My issue here is that establishing a 1:1 relationship between QEMU
guests and AMP peers is building use-case policy into the mechanism. I
am seeing the use-case where there is just one guest that sets up AMP
without any QEMU awareness of the AMPness.

>>>
>>> It is the Master OS that triggers the boot of the
>>> Slave processors, and provides them also the binary code to execute (e.g.
>>> RTOS, binary firmware) by placing it into a pre-defined memory area that
>>> is
>>> accessible to the Slaves. Usually the memory for the Slaves is carved out
>>> from
>>> the Master OS during boot. Once a Slave is booted the two processors can
>>> communicate through queues in shared memory and inter-processor
>>> interrupts
>>> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
>>> control (boot/shutdown) of Slave processors, and also to establish a
>>> communication channel based on virtio queues.
>>>
>>> Currently, QEMU is not able to model such an architecture mainly because
>>> only
>>> a single processor can be emulated at one time,
>>
>> SMP does work already. MTTCG will remove the one-run-at-a-time
>> limitation. Multi-arch will allow you to mix multiple CPU
>> architectures (e.g. PPC + ARM in same QEMU). But multiple
>> heterogeneous ARMs should already just work, and there is already an
>> in-tree precedent with the xlnx-zynqmp SoC. That SoC has 4xA53 and
>> 2xR5 (all ARM).
>
> Since Multi-arch is not yet available, with this proposal it is possible to
> experiment with heterogeneous processors at high level of abstraction,
> even beyond the ARM + ARM (e.g. X86 + ARM), exploiting the off-the-shelf
> QEMU.
>
> One thing I want to add is that all the solutions mentioned in this
> discussion Multi-arch,
> Xilinx's patches, and our proposal could coexist from the code point of
> view, and none
> would prevent the others from being used.

We should consolidate where possible though.

>>
>> Multiple system address spaces and CPUs have different views of the
>> address space is another common snag on this effort, and is discussed
>> on a recent thread between myself and Marcin.
>
> Yes I have seen the discussion, but it was mostly dealing with one single
> QEMU instance modeling all the cores. Here the different address spaces
> are enforced by multiple QEMU instances.
>>>
>>> and the OS binary image needs
>>> to be placed in memory at model startup.
>>>
>> I don't see what this limitation is exactly. Can you explain more? I
>> do see a need to work on the ARM bootloader for AMP flows, it is a
>> pure SMP bootloader than assumes total control.
>
> the problem here was to me that when we launch QEMU a binary needs to be
> provided and put in memory
> in order to be executed. In this patch series the slave doesn't have a
> proper memory allocated when first launched.

But it could though couldn't it? Can't the slave guest just have full
access to it's own address space (probably very similar to the masters
address space) from machine init time? This seems more realistic than
setting up the hardware based on guest level information.

> The information about memory (fd + offset for mmap) is sent only later when
> the boot is triggered. This is also
> safe since the slave will be waiting in the incoming state, and thus no
> corruption or errors can happen before the
> boot is triggered.
>>
>> Can this effort be a bootloader overhaul? Two things:
>>
>> 1: The bootloader needs to repeatable
>> 2: The bootloaders need to be targetable (to certain CPUs or clusters)
>
> Well in this series the bootloader for the master is different from the one
> for the slave. In my idea the master,
> besides the firmware/kernel image, will copy also a bootloader for the
> slave.
>>>
>>> This patch series adds a set of modules and introduces minimal changes to
>>> the
>>> current QEMU code-base to implement what described above, with master and
>>> slave
>>> implemented as two different instances of QEMU. The aim of this work is
>>> to
>>> enable application and runtime programmers to test their AMP
>>> applications, or
>>> their new inter-SoC communtication protocol.
>>>
>>> The main changes are depicted in the following diagram and involve:
>>>      - A new multi-client socket implementation that allows multiple
>>> instances of
>>>        QEMU to attach to the same socket, with only one acting as a
>>> master.
>>>      - A new memory backend, the shared memory backend, based on
>>>        the file memory backend. Such new backend enables, on the master
>>> side,
>>>        to allocate the whole memory as shareable (e.g. /dev/shm, or
>>> hugetlbfs).
>>>        On the slave side it enables the startup of QEMU without any main
>>> memory
>>>        allocated. The the slave goes in a waiting state, the same used in
>>> the
>>>        case of an incoming migration, and a callback is registered on a
>>>        multi-client socket shared with the master.
>>>        The waiting state ends when the master sends to the slave the file
>>>        descriptor and offset to mmap and use as memory.
>>
>> This is useful in it's own right and came up in the Xilinx implementation.
>
> It is also mentioned in the video you are pointing, where the Microblaze
> cores are instantiated as foreign QEMU instances.
> Is the code publicly available? There was a question about that in the video
> but I couldn't catch the answer.

This would probably be a starting point, this is the remote port
adapter, which is a generic construct for sending/receiving hardware
events to/from things outside QEMU (or other QEMUs):

https://github.com/Xilinx/qemu/blob/pub/2015.2.plnx/hw/core/remote-port.c

This specific device interfaces to RP adapter and lets you send GPIOs
between QEMUs:

https://github.com/Xilinx/qemu/blob/pub/2015.2.plnx/hw/core/remote-port-gpio.c

>>>
>>>      - A new inter-processor interrupt hardware distribution module, that
>>> is used
>>>        also to trigger the boot of slave processors. Such module uses a
>>> pair of
>>>        eventfd for each master-slave couple to trigger interrupts between
>>> the
>>>        instances. No slave-to-slave interrupts are envisioned by the
>>> current
>>>        implementation.
>>
>> Wouldn't that just be a software interrupt in the local QEMU instance?
>
> Since in this proposal there will be multiple instances of QEMU running at
> the same time, eventfd
> are used to signal the event (interrupt) among the different processes. So
> writing to a register of the IDM
> will raise an interrupt to a remote QEMU instance using eventfd. Did this
> answer your question?

I was thinking more about your comment about slave-to-slave
interrupts. This would just trivially be a local software-generated
interrupts of some form within the slave cluster.

>>>
>>> The multi client-socket is used for the master to trigger
>>>        the boot of a slave, and also for each master-slave couple to
>>> exchancge the
>>>        eventd file descriptors. The IDM device can be instantiated either
>>> as a
>>>        PCI or sysbus device.
>>>
>> So if everything is is one QEMU, IPIs can be implemented with just a
>> regular interrupt controller (which has a software set).
>
> As said there are multiple instances of QEMU running at the same time, and
> each of them will see the IDM in their memory map.
> Even if the IDM instances will be physically different, because of the
> multiple processes, all together will act as a single block (e.g., a light
> version of a mailbox).
>
>>>                             Memory
>>>                             (e.g. hugetlbfs)
>>>
>>> +------------------+       +--------------+
>>> +------------------+
>>> |                  |       |              |            |
>>> |
>>> |   QEMU MASTER    |       |   Master     |            |   QEMU SLAVE
>>> |
>>> |                  |       |   Memory     |            |
>>> |
>>> | +------+  +------+-+     |              |          +-+------+  +------+
>>> |
>>> | |      |  |SHMEM   |     |              |          |SHMEM   |  |      |
>>> |
>>> | | VCPU |  |Backend +----->              |    +----->Backend |  | VCPU |
>>> |
>>> | |      |  |        |     |              |    | +--->        |  |      |
>>> |
>>> | +--^---+  +------+-+     |              |    | |   +-+------+  +--^---+
>>> |
>>> |    |             |       |              |    | |     |            |
>>> |
>>> |    +--+          |       |              |    | |     |        +---+
>>> |
>>> |       | IRQ      |       | +----------+ |    | |     |    IRQ |
>>> |
>>> |       |          |       | |          | |    | |     |        |
>>> |
>>> |  +----+----+     |       | | Slave    <------+ |     |   +----+---+
>>> |
>>> +--+  IDM    +-----+       | | Memory   | |      |     +---+ IDM
>>> +-----+
>>>     +-^----^--+             | |          | |      |         +-^---^--+
>>>       |    |                | +----------+ |      |           |   |
>>>       |    |                +--------------+      |           |   |
>>>       |    |                                      |           |   |
>>>       |    +--------------------------------------+-----------+   |
>>>       |   UNIX Domain Socket(send mem fd + offset, trigger boot)  |
>>>       |                                                           |
>>>       +-----------------------------------------------------------+
>>>                                eventfd
>>>
>> So the slave can only see a subset of the masters memory? Is the
>> masters memory just the full system memory and the master is doing
>> IOMMU setup for the slave pre-boot? Or is it a hard feature of the
>> physical SoC?
>
> Yes slaves can only see the memory that has been reserved for them. This is
> ensured by carving out
> the memory from the master kernel and providing the offset to such memory to
> the slave. Each slave
> will have its own memory map, and see the memory at the address defined in
> the machine model.
> There is no IOMMU modeled, but it is neither a hard feature since decided at
> run-time.
>>>
>>> The whole code can be checked out from:
>>> https://git.virtualopensystems.com/dev/qemu-het.git
>>> branch:
>>> qemu-het-rfc-v1
>>>
>>> Patches apply to the current QEMU master branch
>>>
>>> =========
>>> Demo
>>> =========
>>>
>>> This patch series comes in the form of a demo to better understand how
>>> the
>>> changes introduced can be exploited.
>>> At the current status the demo can be executed using an ARM target for
>>> both
>>> master and slave.
>>>
>>> The demo shows how a master QEMU instance carves out the memory for a
>>> slave,
>>> copies inside linux kernel image and device tree blob and finally
>>> triggers the
>>> boot.
>>>
>> These processes must have underlying hardware implementation, is the
>> master using a system controller to implement the slave boot? (setting
>> reset and entry points via registers?). How hard are they to model as
>> regular devs?
>>
> In this series the system controller is the IDM device, that through a set
> of registers makes the master in
> "control" each of the slaves. The IDM device is already seen as a regular
> device by each of the QEMU instances
> involved.
>

I'm starting to think this series is two things that should be
decoupled. One is the abstract device(s) to facilitate your AMP, the
other is the inter-qemu communication. For the abstract device, I
guess this would be a new virtio-idm device. We should try and involve
virtio people perhaps. I can see the value in it quite separate from
modelling the real sysctrl hardware. But I think the implementation
should be free of any inter-QEMU awareness. E.g. from P4 of this
series:

+static void send_shmem_fd(IDMState *s, MSClient *c)
+{
+    int fd, len;
+    uint32_t *message;
+    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
+
+    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
+    message = malloc(len * sizeof(uint32_t));
+    strcpy((char *) message, SEND_MEM_FD_CMD);
+    message[len - 2] = s->pboot_size;
+    message[len - 1] = s->pboot_offset;
+
+    fd = memory_region_get_fd(&backend->mr);
+
+    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
sizeof(uint32_t));

The device itself is aware of shared-memory and multi-sockets. Using
the device for single-QEMU AMP would require neither - can the IDM
device be used in a homogeneous AMP flow in one of our existing SMP
machine models (eg on a dual core A9 with one core being master and
the other slave)?

Can this be architected in two phases for greater utility, with the
AMP devices as just normal devices, and the inter-qemu communication
as a separate feature?

Regards,
Peter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-07 15:48     ` Peter Crosthwaite
@ 2015-10-22  9:21       ` Christian Pinto
  2015-10-25 21:38         ` Peter Crosthwaite
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Pinto @ 2015-10-22  9:21 UTC (permalink / raw)
  To: Peter Crosthwaite, mst
  Cc: Edgar Iglesias, Peter Maydell, Claudio.Fontana,
	qemu-devel@nongnu.org Developers, Jani.Kokkonen, tech,
	mar.krzeminski

Hello Peter,

On 07/10/2015 17:48, Peter Crosthwaite wrote:
> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
> <c.pinto@virtualopensystems.com> wrote:
>> Hello Peter,
>>
>> thanks for your comments
>>
>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>> <c.pinto@virtualopensystems.com>  wrote:
>>>> Hi all,
>>>>
>>>> This RFC patch-series introduces the set of changes enabling the
>>>> architectural elements to model the architecture presented in a previous
>>>> RFC
>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>>>>
>>>> To recap the goal of such RFC:
>>>>
>>>> The idea is to enhance the current architecture of QEMU to enable the
>>>> modeling
>>>> of a state of-the-art SoC with an AMP processing style, where different
>>>> processing units share the same system memory and communicate through
>>>> shared
>>>> memory and inter-processor interrupts.
>>> This might have a lot in common with a similar inter-qemu
>>> communication solution effort at Xilinx. Edgar talks about it at KVM
>>> forum:
>>>
>>> https://www.youtube.com/watch?v=L5zG5Aukfek
>>>
>>> Around 18:30 mark. I think it might be lower level that your proposal,
>>> remote-port is designed to export the raw hardware interfaces (busses
>>> and pins) between QEMU and some other system, another QEMU being the
>>> common use cases.
>> Thanks for pointing this out. Indeed what presented by Edgar has a lot
>> of similarities with our proposal, but is targeting a different scenario
>> where low-level modeling of the various hardware components is taken
>> into account.
>> The goal of my proposal is on the other hand to enable a set of tools
>> for high level early prototyping of systems with a heterogeneous
>> set of cores, so to model a platform that does not exist in reality but that
>> the user wants to experiment with.
>> As an example I can envision a programming model researcher willing to
>> explore an
>> heterogeneous system based on an X86 and a multi-core ARM accelerator
>> sharing memory,
>> to build a new programming paradigm on top of it. Such user would not need
>> the specific details
>> of the hardware nor all the various devices available in a real SoC, but
>> only an abstract model
>> encapsulating the main features needed for his research.
>>
>> So to link also to your next comment there is no actual SoC/hardware
>> targeted by this
>> work.
>>>> An example is a multi-core ARM CPU
>>>> working alongside with two Cortex-M micro controllers.
>>>>
>>> Marcin is doing something with A9+M3. It sounds like he already has a
>>> lot working (latest emails were on some finer points). What is the
>>> board/SoC in question here (if you are able to share)?
>>>
>>>>   From the user point of view there is usually an operating system booting
>>>> on
>>>> the Master processor (e.g. Linux) at platform startup, while the other
>>>> processors are used to offload the Master one from some computation or to
>>>> deal
>>>> with real-time interfaces.
>>> I feel like this is architecting hardware based on common software use
>>> cases, rather than directly modelling the SoC in question. Can we
>>> model the hardware (e.g. the devices that are used for rpmesg and IPIs
>>> etc.) as regular devices, as it is in-SoC? That means AMP is just
>>> another guest?
>> This is a set of extensions focusing more on the communication channel
>> between the processors
>> rather than a full SoC model. With this patch series each of the AMP
>> processors is a different "guest".
> Ok. My issue here is that establishing a 1:1 relationship between QEMU
> guests and AMP peers is building use-case policy into the mechanism. I
> am seeing the use-case where there is just one guest that sets up AMP
> without any QEMU awareness of the AMPness.
>
>>>> It is the Master OS that triggers the boot of the
>>>> Slave processors, and provides them also the binary code to execute (e.g.
>>>> RTOS, binary firmware) by placing it into a pre-defined memory area that
>>>> is
>>>> accessible to the Slaves. Usually the memory for the Slaves is carved out
>>>> from
>>>> the Master OS during boot. Once a Slave is booted the two processors can
>>>> communicate through queues in shared memory and inter-processor
>>>> interrupts
>>>> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
>>>> control (boot/shutdown) of Slave processors, and also to establish a
>>>> communication channel based on virtio queues.
>>>>
>>>> Currently, QEMU is not able to model such an architecture mainly because
>>>> only
>>>> a single processor can be emulated at one time,
>>> SMP does work already. MTTCG will remove the one-run-at-a-time
>>> limitation. Multi-arch will allow you to mix multiple CPU
>>> architectures (e.g. PPC + ARM in same QEMU). But multiple
>>> heterogeneous ARMs should already just work, and there is already an
>>> in-tree precedent with the xlnx-zynqmp SoC. That SoC has 4xA53 and
>>> 2xR5 (all ARM).
>> Since Multi-arch is not yet available, with this proposal it is possible to
>> experiment with heterogeneous processors at high level of abstraction,
>> even beyond the ARM + ARM (e.g. X86 + ARM), exploiting the off-the-shelf
>> QEMU.
>>
>> One thing I want to add is that all the solutions mentioned in this
>> discussion Multi-arch,
>> Xilinx's patches, and our proposal could coexist from the code point of
>> view, and none
>> would prevent the others from being used.
> We should consolidate where possible though.
>
>>> Multiple system address spaces and CPUs have different views of the
>>> address space is another common snag on this effort, and is discussed
>>> on a recent thread between myself and Marcin.
>> Yes I have seen the discussion, but it was mostly dealing with one single
>> QEMU instance modeling all the cores. Here the different address spaces
>> are enforced by multiple QEMU instances.
>>>> and the OS binary image needs
>>>> to be placed in memory at model startup.
>>>>
>>> I don't see what this limitation is exactly. Can you explain more? I
>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>> pure SMP bootloader than assumes total control.
>> the problem here was to me that when we launch QEMU a binary needs to be
>> provided and put in memory
>> in order to be executed. In this patch series the slave doesn't have a
>> proper memory allocated when first launched.
> But it could though couldn't it? Can't the slave guest just have full
> access to it's own address space (probably very similar to the masters
> address space) from machine init time? This seems more realistic than
> setting up the hardware based on guest level information.

Actually the address space for a slave is built at init time, the thing 
that is not
completely configured is the memory region modeling the RAM. Such region 
is configured
in terms of size, but there is no pointer to the actual memory. The 
pointer is mmap-ed later
before the slave boots.

>
>> The information about memory (fd + offset for mmap) is sent only later when
>> the boot is triggered. This is also
>> safe since the slave will be waiting in the incoming state, and thus no
>> corruption or errors can happen before the
>> boot is triggered.
>>> Can this effort be a bootloader overhaul? Two things:
>>>
>>> 1: The bootloader needs to repeatable
>>> 2: The bootloaders need to be targetable (to certain CPUs or clusters)
>> Well in this series the bootloader for the master is different from the one
>> for the slave. In my idea the master,
>> besides the firmware/kernel image, will copy also a bootloader for the
>> slave.
>>>> This patch series adds a set of modules and introduces minimal changes to
>>>> the
>>>> current QEMU code-base to implement what described above, with master and
>>>> slave
>>>> implemented as two different instances of QEMU. The aim of this work is
>>>> to
>>>> enable application and runtime programmers to test their AMP
>>>> applications, or
>>>> their new inter-SoC communtication protocol.
>>>>
>>>> The main changes are depicted in the following diagram and involve:
>>>>       - A new multi-client socket implementation that allows multiple
>>>> instances of
>>>>         QEMU to attach to the same socket, with only one acting as a
>>>> master.
>>>>       - A new memory backend, the shared memory backend, based on
>>>>         the file memory backend. Such new backend enables, on the master
>>>> side,
>>>>         to allocate the whole memory as shareable (e.g. /dev/shm, or
>>>> hugetlbfs).
>>>>         On the slave side it enables the startup of QEMU without any main
>>>> memory
>>>>         allocated. The the slave goes in a waiting state, the same used in
>>>> the
>>>>         case of an incoming migration, and a callback is registered on a
>>>>         multi-client socket shared with the master.
>>>>         The waiting state ends when the master sends to the slave the file
>>>>         descriptor and offset to mmap and use as memory.
>>> This is useful in it's own right and came up in the Xilinx implementation.
>> It is also mentioned in the video you are pointing, where the Microblaze
>> cores are instantiated as foreign QEMU instances.
>> Is the code publicly available? There was a question about that in the video
>> but I couldn't catch the answer.
> This would probably be a starting point, this is the remote port
> adapter, which is a generic construct for sending/receiving hardware
> events to/from things outside QEMU (or other QEMUs):
>
> https://github.com/Xilinx/qemu/blob/pub/2015.2.plnx/hw/core/remote-port.c
>
> This specific device interfaces to RP adapter and lets you send GPIOs
> between QEMUs:
>
> https://github.com/Xilinx/qemu/blob/pub/2015.2.plnx/hw/core/remote-port-gpio.c

Thanks, quite interesting work. As you said already,
that is a lower level approach modeling details like the bus transactions.
While the part of memory sharing between the qemu instances is not going 
through the socket
but using the file backed memory, similar to what we do in our work.

>>>>       - A new inter-processor interrupt hardware distribution module, that
>>>> is used
>>>>         also to trigger the boot of slave processors. Such module uses a
>>>> pair of
>>>>         eventfd for each master-slave couple to trigger interrupts between
>>>> the
>>>>         instances. No slave-to-slave interrupts are envisioned by the
>>>> current
>>>>         implementation.
>>> Wouldn't that just be a software interrupt in the local QEMU instance?
>> Since in this proposal there will be multiple instances of QEMU running at
>> the same time, eventfd
>> are used to signal the event (interrupt) among the different processes. So
>> writing to a register of the IDM
>> will raise an interrupt to a remote QEMU instance using eventfd. Did this
>> answer your question?
> I was thinking more about your comment about slave-to-slave
> interrupts. This would just trivially be a local software-generated
> interrupts of some form within the slave cluster.

Sorry, I did not catch your comment at first time. You are right, if 
cores are in the same cluster
a software generated interrupt is going to be enough. Of course the 
eventfd based interrupts
make sense for a remote QEMU.

>>>> The multi client-socket is used for the master to trigger
>>>>         the boot of a slave, and also for each master-slave couple to
>>>> exchancge the
>>>>         eventd file descriptors. The IDM device can be instantiated either
>>>> as a
>>>>         PCI or sysbus device.
>>>>
>>> So if everything is is one QEMU, IPIs can be implemented with just a
>>> regular interrupt controller (which has a software set).
>> As said there are multiple instances of QEMU running at the same time, and
>> each of them will see the IDM in their memory map.
>> Even if the IDM instances will be physically different, because of the
>> multiple processes, all together will act as a single block (e.g., a light
>> version of a mailbox).
>>
>>>>                              Memory
>>>>                              (e.g. hugetlbfs)
>>>>
>>>> +------------------+       +--------------+
>>>> +------------------+
>>>> |                  |       |              |            |
>>>> |
>>>> |   QEMU MASTER    |       |   Master     |            |   QEMU SLAVE
>>>> |
>>>> |                  |       |   Memory     |            |
>>>> |
>>>> | +------+  +------+-+     |              |          +-+------+  +------+
>>>> |
>>>> | |      |  |SHMEM   |     |              |          |SHMEM   |  |      |
>>>> |
>>>> | | VCPU |  |Backend +----->              |    +----->Backend |  | VCPU |
>>>> |
>>>> | |      |  |        |     |              |    | +--->        |  |      |
>>>> |
>>>> | +--^---+  +------+-+     |              |    | |   +-+------+  +--^---+
>>>> |
>>>> |    |             |       |              |    | |     |            |
>>>> |
>>>> |    +--+          |       |              |    | |     |        +---+
>>>> |
>>>> |       | IRQ      |       | +----------+ |    | |     |    IRQ |
>>>> |
>>>> |       |          |       | |          | |    | |     |        |
>>>> |
>>>> |  +----+----+     |       | | Slave    <------+ |     |   +----+---+
>>>> |
>>>> +--+  IDM    +-----+       | | Memory   | |      |     +---+ IDM
>>>> +-----+
>>>>      +-^----^--+             | |          | |      |         +-^---^--+
>>>>        |    |                | +----------+ |      |           |   |
>>>>        |    |                +--------------+      |           |   |
>>>>        |    |                                      |           |   |
>>>>        |    +--------------------------------------+-----------+   |
>>>>        |   UNIX Domain Socket(send mem fd + offset, trigger boot)  |
>>>>        |                                                           |
>>>>        +-----------------------------------------------------------+
>>>>                                 eventfd
>>>>
>>> So the slave can only see a subset of the masters memory? Is the
>>> masters memory just the full system memory and the master is doing
>>> IOMMU setup for the slave pre-boot? Or is it a hard feature of the
>>> physical SoC?
>> Yes slaves can only see the memory that has been reserved for them. This is
>> ensured by carving out
>> the memory from the master kernel and providing the offset to such memory to
>> the slave. Each slave
>> will have its own memory map, and see the memory at the address defined in
>> the machine model.
>> There is no IOMMU modeled, but it is neither a hard feature since decided at
>> run-time.
>>>> The whole code can be checked out from:
>>>> https://git.virtualopensystems.com/dev/qemu-het.git
>>>> branch:
>>>> qemu-het-rfc-v1
>>>>
>>>> Patches apply to the current QEMU master branch
>>>>
>>>> =========
>>>> Demo
>>>> =========
>>>>
>>>> This patch series comes in the form of a demo to better understand how
>>>> the
>>>> changes introduced can be exploited.
>>>> At the current status the demo can be executed using an ARM target for
>>>> both
>>>> master and slave.
>>>>
>>>> The demo shows how a master QEMU instance carves out the memory for a
>>>> slave,
>>>> copies inside linux kernel image and device tree blob and finally
>>>> triggers the
>>>> boot.
>>>>
>>> These processes must have underlying hardware implementation, is the
>>> master using a system controller to implement the slave boot? (setting
>>> reset and entry points via registers?). How hard are they to model as
>>> regular devs?
>>>
>> In this series the system controller is the IDM device, that through a set
>> of registers makes the master in
>> "control" each of the slaves. The IDM device is already seen as a regular
>> device by each of the QEMU instances
>> involved.
>>
> I'm starting to think this series is two things that should be
> decoupled. One is the abstract device(s) to facilitate your AMP, the
> other is the inter-qemu communication. For the abstract device, I
> guess this would be a new virtio-idm device. We should try and involve
> virtio people perhaps. I can see the value in it quite separate from
> modelling the real sysctrl hardware.

Interesting, which other value/usage do you see in it? For me the IDM 
was meant to
work as an abstract system controller to centralize the management
of the slaves (boot_regs and interrupts).

> But I think the implementation
> should be free of any inter-QEMU awareness. E.g. from P4 of this
> series:
>
> +static void send_shmem_fd(IDMState *s, MSClient *c)
> +{
> +    int fd, len;
> +    uint32_t *message;
> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
> +
> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
> +    message = malloc(len * sizeof(uint32_t));
> +    strcpy((char *) message, SEND_MEM_FD_CMD);
> +    message[len - 2] = s->pboot_size;
> +    message[len - 1] = s->pboot_offset;
> +
> +    fd = memory_region_get_fd(&backend->mr);
> +
> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
> sizeof(uint32_t));
>
> The device itself is aware of shared-memory and multi-sockets. Using
> the device for single-QEMU AMP would require neither - can the IDM
> device be used in a homogeneous AMP flow in one of our existing SMP
> machine models (eg on a dual core A9 with one core being master and
> the other slave)?
>
> Can this be architected in two phases for greater utility, with the
> AMP devices as just normal devices, and the inter-qemu communication
> as a separate feature?

I see your point, and it is an interesting proposal.

What I can think here to remove the awareness of how the IDM 
communicates with
the slaves, is to define a kind of AMP Slave interface. So there will be an
instance of the interface for each of the slaves, encapsulating the
communication part (being either local or based on sockets).
The AMP Slave interfaces would be what you called the AMP devices, with 
one device per slave.

At master side, besides the IDM, one would instantiate
as many interface devices as slaves. During the initialization the IDM 
will link
with all those interfaces, and only call functions like: send_interrupt() or
boot_slave() to interact with the slaves. The interface will be the same for
both local or remote slaves, while the implementation of the methods will
differ and reside in the specific AMP Slave Interface device.
On the slave side, if the slave is remote, another instance of the
interface is instantiated so to connect to socket/eventfd.

So as an example the send_shmem_fd function you pointed could be hidden 
in the
slave interface, and invoked only when the IDM will invoke the slave_boot()
function of a remote slave interface.

This would higher the level of abstraction and open the door to 
potentially any
communication mechanism between master and slave, without the need to 
adapt the
IDM device to the specific case. Or, eventually, to mix between local and
remote instances.


Thanks,

Christian

>
> Regards,
> Peter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-22  9:21       ` Christian Pinto
@ 2015-10-25 21:38         ` Peter Crosthwaite
  2015-10-26 17:12           ` mar.krzeminski
  2015-10-27 10:30           ` Christian Pinto
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Crosthwaite @ 2015-10-25 21:38 UTC (permalink / raw)
  To: Christian Pinto
  Cc: Edgar Iglesias, Peter Maydell, mst, Claudio.Fontana,
	qemu-devel@nongnu.org Developers, Jani.Kokkonen, tech,
	mar.krzeminski

On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
<c.pinto@virtualopensystems.com> wrote:
> Hello Peter,
>
>
> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>
>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>> <c.pinto@virtualopensystems.com> wrote:
>>>
>>> Hello Peter,
>>>
>>> thanks for your comments
>>>
>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>
>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>> <c.pinto@virtualopensystems.com>  wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>> architectural elements to model the architecture presented in a
>>>>> previous
>>>>> RFC
>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".

>>>>>
>>>>> and the OS binary image needs
>>>>> to be placed in memory at model startup.
>>>>>
>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>> pure SMP bootloader than assumes total control.
>>>
>>> the problem here was to me that when we launch QEMU a binary needs to be
>>> provided and put in memory
>>> in order to be executed. In this patch series the slave doesn't have a
>>> proper memory allocated when first launched.
>>
>> But it could though couldn't it? Can't the slave guest just have full
>> access to it's own address space (probably very similar to the masters
>> address space) from machine init time? This seems more realistic than
>> setting up the hardware based on guest level information.
>
>
> Actually the address space for a slave is built at init time, the thing that
> is not
> completely configured is the memory region modeling the RAM. Such region is
> configured
> in terms of size, but there is no pointer to the actual memory. The pointer
> is mmap-ed later
> before the slave boots.
>

based on what information? Is the master guest controlling this? If so
what is the real-hardware analogue for this concept where the address
map of the slave can change (i.e. be configured) at runtime?

>
>>
>>> The information about memory (fd + offset for mmap) is sent only later
>>> when
>>> the boot is triggered. This is also
>>> safe since the slave will be waiting in the incoming state, and thus no
>>> corruption or errors can happen before the
>>> boot is triggered.

>>
>> I was thinking more about your comment about slave-to-slave
>> interrupts. This would just trivially be a local software-generated
>> interrupts of some form within the slave cluster.
>
>
> Sorry, I did not catch your comment at first time. You are right, if cores
> are in the same cluster
> a software generated interrupt is going to be enough. Of course the eventfd
> based interrupts
> make sense for a remote QEMU.
>

Is eventfd a better implementation of remote-port GPIOs as in the Xilinx work?

Re the terminology, I don't like the idea of thinking of inter-qemu
"interrupts" as whatever system we decide on should be able to support
arbitrary signals going from one QEMU to another. I think the Xilinx
work already has reset signals going between the QEMU peers.

>
>>>>> The multi client-socket is used for the master to trigger
>>>>>         the boot of a slave, and also for each master-slave couple to
>>>>> exchancge the
>>>>>         eventd file descriptors. The IDM device can be instantiated
>>>>> either
>>>>> as a
>>>>>         PCI or sysbus device.
>>>>>
>>>> So if everything is is one QEMU, IPIs can be implemented with just a

>>> of registers makes the master in
>>> "control" each of the slaves. The IDM device is already seen as a regular
>>> device by each of the QEMU instances
>>> involved.
>>>
>> I'm starting to think this series is two things that should be
>> decoupled. One is the abstract device(s) to facilitate your AMP, the
>> other is the inter-qemu communication. For the abstract device, I
>> guess this would be a new virtio-idm device. We should try and involve
>> virtio people perhaps. I can see the value in it quite separate from
>> modelling the real sysctrl hardware.
>
>
> Interesting, which other value/usage do you see in it? For me the IDM was
> meant to

It has value in prototyping with your abstract toolkit even with
homogeneous hardware. E.g. I should be able to just use single-QEMU
ARM virt machine -smp 2 and create one of these virtio-AMP setups.
Homogeneous hardware with heterogenous software using your new pieces
of abstract hardware.

It is also more practical for getting a merge of your work as you are
targetting two different audiences with the work. People intersted in
virtio can handle the new devices you create, while the core
maintainers can handle your multi-QEMU work. It is two rather big new
features.

> work as an abstract system controller to centralize the management
> of the slaves (boot_regs and interrupts).
>
>
>> But I think the implementation
>> should be free of any inter-QEMU awareness. E.g. from P4 of this
>> series:
>>
>> +static void send_shmem_fd(IDMState *s, MSClient *c)
>> +{
>> +    int fd, len;
>> +    uint32_t *message;
>> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
>> +
>> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
>> +    message = malloc(len * sizeof(uint32_t));
>> +    strcpy((char *) message, SEND_MEM_FD_CMD);
>> +    message[len - 2] = s->pboot_size;
>> +    message[len - 1] = s->pboot_offset;
>> +
>> +    fd = memory_region_get_fd(&backend->mr);
>> +
>> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
>> sizeof(uint32_t));
>>
>> The device itself is aware of shared-memory and multi-sockets. Using
>> the device for single-QEMU AMP would require neither - can the IDM
>> device be used in a homogeneous AMP flow in one of our existing SMP
>> machine models (eg on a dual core A9 with one core being master and
>> the other slave)?
>>
>> Can this be architected in two phases for greater utility, with the
>> AMP devices as just normal devices, and the inter-qemu communication
>> as a separate feature?
>
>
> I see your point, and it is an interesting proposal.
>
> What I can think here to remove the awareness of how the IDM communicates
> with
> the slaves, is to define a kind of AMP Slave interface. So there will be an
> instance of the interface for each of the slaves, encapsulating the
> communication part (being either local or based on sockets).
> The AMP Slave interfaces would be what you called the AMP devices, with one
> device per slave.
>

Do we need this hard definition of master and slave in the hardware?
Can the virtio-device be more peer-peer and the master-slave
relationship is purely implemented by the guest?

Regards,
Peter

> At master side, besides the IDM, one would instantiate
> as many interface devices as slaves. During the initialization the IDM will
> link
> with all those interfaces, and only call functions like: send_interrupt() or
> boot_slave() to interact with the slaves. The interface will be the same for
> both local or remote slaves, while the implementation of the methods will
> differ and reside in the specific AMP Slave Interface device.
> On the slave side, if the slave is remote, another instance of the
> interface is instantiated so to connect to socket/eventfd.
>
> So as an example the send_shmem_fd function you pointed could be hidden in
> the
> slave interface, and invoked only when the IDM will invoke the slave_boot()
> function of a remote slave interface.
>
> This would higher the level of abstraction and open the door to potentially
> any
> communication mechanism between master and slave, without the need to adapt
> the
> IDM device to the specific case. Or, eventually, to mix between local and
> remote instances.
>
>
> Thanks,
>
> Christian
>
>>
>> Regards,
>> Peter
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-25 21:38         ` Peter Crosthwaite
@ 2015-10-26 17:12           ` mar.krzeminski
  2015-10-26 17:42             ` Peter Crosthwaite
  2015-10-27 10:30           ` Christian Pinto
  1 sibling, 1 reply; 19+ messages in thread
From: mar.krzeminski @ 2015-10-26 17:12 UTC (permalink / raw)
  To: Peter Crosthwaite, Christian Pinto
  Cc: Edgar Iglesias, Peter Maydell, mst, Claudio.Fontana,
	qemu-devel@nongnu.org Developers, Jani.Kokkonen, tech



W dniu 25.10.2015 o 22:38, Peter Crosthwaite pisze:
> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
> <c.pinto@virtualopensystems.com> wrote:
>> Hello Peter,
>>
>>
>> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>>> <c.pinto@virtualopensystems.com> wrote:
>>>> Hello Peter,
>>>>
>>>> thanks for your comments
>>>>
>>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>>> <c.pinto@virtualopensystems.com>  wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>>> architectural elements to model the architecture presented in a
>>>>>> previous
>>>>>> RFC
>>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
Sorry for late response, unfortunately my M3+A9 SoC can not be published 
(yet).
But I am working on it.
>>>>>> and the OS binary image needs
>>>>>> to be placed in memory at model startup.
>>>>>>
>>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>>> pure SMP bootloader than assumes total control.
>>>> the problem here was to me that when we launch QEMU a binary needs to be
>>>> provided and put in memory
>>>> in order to be executed. In this patch series the slave doesn't have a
>>>> proper memory allocated when first launched.
>>> But it could though couldn't it? Can't the slave guest just have full
>>> access to it's own address space (probably very similar to the masters
>>> address space) from machine init time? This seems more realistic than
>>> setting up the hardware based on guest level information.
>>
>> Actually the address space for a slave is built at init time, the thing that
>> is not
>> completely configured is the memory region modeling the RAM. Such region is
>> configured
>> in terms of size, but there is no pointer to the actual memory. The pointer
>> is mmap-ed later
>> before the slave boots.
>>
> based on what information? Is the master guest controlling this? If so
> what is the real-hardware analogue for this concept where the address
> map of the slave can change (i.e. be configured) at runtime?
I am not sure if it is the case since I haven't emulated this yet (and 
it has very low priority),
but I might have a real case in my M3+A9 - M3 has 256MiB window that can 
be moved over the 1GiB system memory at runtime.

>
>>>> The information about memory (fd + offset for mmap) is sent only later
>>>> when
>>>> the boot is triggered. This is also
>>>> safe since the slave will be waiting in the incoming state, and thus no
>>>> corruption or errors can happen before the
>>>> boot is triggered.
>>> I was thinking more about your comment about slave-to-slave
>>> interrupts. This would just trivially be a local software-generated
>>> interrupts of some form within the slave cluster.
>>
>> Sorry, I did not catch your comment at first time. You are right, if cores
>> are in the same cluster
>> a software generated interrupt is going to be enough. Of course the eventfd
>> based interrupts
>> make sense for a remote QEMU.
>>
> Is eventfd a better implementation of remote-port GPIOs as in the Xilinx work?
>
> Re the terminology, I don't like the idea of thinking of inter-qemu
> "interrupts" as whatever system we decide on should be able to support
> arbitrary signals going from one QEMU to another. I think the Xilinx
> work already has reset signals going between the QEMU peers.
>
>>>>>> The multi client-socket is used for the master to trigger
>>>>>>          the boot of a slave, and also for each master-slave couple to
>>>>>> exchancge the
>>>>>>          eventd file descriptors. The IDM device can be instantiated
>>>>>> either
>>>>>> as a
>>>>>>          PCI or sysbus device.
>>>>>>
>>>>> So if everything is is one QEMU, IPIs can be implemented with just a
>>>> of registers makes the master in
>>>> "control" each of the slaves. The IDM device is already seen as a regular
>>>> device by each of the QEMU instances
>>>> involved.
>>>>
>>> I'm starting to think this series is two things that should be
>>> decoupled. One is the abstract device(s) to facilitate your AMP, the
>>> other is the inter-qemu communication. For the abstract device, I
>>> guess this would be a new virtio-idm device. We should try and involve
>>> virtio people perhaps. I can see the value in it quite separate from
>>> modelling the real sysctrl hardware.
>>
>> Interesting, which other value/usage do you see in it? For me the IDM was
>> meant to
> It has value in prototyping with your abstract toolkit even with
> homogeneous hardware. E.g. I should be able to just use single-QEMU
> ARM virt machine -smp 2 and create one of these virtio-AMP setups.
> Homogeneous hardware with heterogenous software using your new pieces
> of abstract hardware.
>
> It is also more practical for getting a merge of your work as you are
> targetting two different audiences with the work. People intersted in
> virtio can handle the new devices you create, while the core
> maintainers can handle your multi-QEMU work. It is two rather big new
> features.
>
>> work as an abstract system controller to centralize the management
>> of the slaves (boot_regs and interrupts).
>>
>>
>>> But I think the implementation
>>> should be free of any inter-QEMU awareness. E.g. from P4 of this
>>> series:
>>>
>>> +static void send_shmem_fd(IDMState *s, MSClient *c)
>>> +{
>>> +    int fd, len;
>>> +    uint32_t *message;
>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
>>> +
>>> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
>>> +    message = malloc(len * sizeof(uint32_t));
>>> +    strcpy((char *) message, SEND_MEM_FD_CMD);
>>> +    message[len - 2] = s->pboot_size;
>>> +    message[len - 1] = s->pboot_offset;
>>> +
>>> +    fd = memory_region_get_fd(&backend->mr);
>>> +
>>> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
>>> sizeof(uint32_t));
>>>
>>> The device itself is aware of shared-memory and multi-sockets. Using
>>> the device for single-QEMU AMP would require neither - can the IDM
>>> device be used in a homogeneous AMP flow in one of our existing SMP
>>> machine models (eg on a dual core A9 with one core being master and
>>> the other slave)?
>>>
>>> Can this be architected in two phases for greater utility, with the
>>> AMP devices as just normal devices, and the inter-qemu communication
>>> as a separate feature?
>>
>> I see your point, and it is an interesting proposal.
>>
>> What I can think here to remove the awareness of how the IDM communicates
>> with
>> the slaves, is to define a kind of AMP Slave interface. So there will be an
>> instance of the interface for each of the slaves, encapsulating the
>> communication part (being either local or based on sockets).
>> The AMP Slave interfaces would be what you called the AMP devices, with one
>> device per slave.
>>
> Do we need this hard definition of master and slave in the hardware?
> Can the virtio-device be more peer-peer and the master-slave
> relationship is purely implemented by the guest?
>
> Regards,
> Peter
>
>> At master side, besides the IDM, one would instantiate
>> as many interface devices as slaves. During the initialization the IDM will
>> link
>> with all those interfaces, and only call functions like: send_interrupt() or
>> boot_slave() to interact with the slaves. The interface will be the same for
>> both local or remote slaves, while the implementation of the methods will
>> differ and reside in the specific AMP Slave Interface device.
>> On the slave side, if the slave is remote, another instance of the
>> interface is instantiated so to connect to socket/eventfd.
>>
>> So as an example the send_shmem_fd function you pointed could be hidden in
>> the
>> slave interface, and invoked only when the IDM will invoke the slave_boot()
>> function of a remote slave interface.
>>
>> This would higher the level of abstraction and open the door to potentially
>> any
>> communication mechanism between master and slave, without the need to adapt
>> the
>> IDM device to the specific case. Or, eventually, to mix between local and
>> remote instances.
>>
>>
>> Thanks,
>>
>> Christian
>>
>>> Regards,
>>> Peter
>>
Regards,
Marcin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-26 17:12           ` mar.krzeminski
@ 2015-10-26 17:42             ` Peter Crosthwaite
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Crosthwaite @ 2015-10-26 17:42 UTC (permalink / raw)
  To: mar.krzeminski, Paolo Bonzini
  Cc: Edgar Iglesias, Peter Maydell, mst, Claudio.Fontana,
	qemu-devel@nongnu.org Developers, Christian Pinto, Jani.Kokkonen,
	tech

On Mon, Oct 26, 2015 at 10:12 AM, mar.krzeminski
<mar.krzeminski@gmail.com> wrote:
>
>
> W dniu 25.10.2015 o 22:38, Peter Crosthwaite pisze:
>>
>> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
>> <c.pinto@virtualopensystems.com> wrote:
>>>
>>> Hello Peter,
>>>
>>>
>>> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>>>
>>>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>>>> <c.pinto@virtualopensystems.com> wrote:
>>>>>
>>>>> Hello Peter,
>>>>>
>>>>> thanks for your comments
>>>>>
>>>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>>>
>>>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>>>> <c.pinto@virtualopensystems.com>  wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>>>> architectural elements to model the architecture presented in a
>>>>>>> previous
>>>>>>> RFC
>>>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>
> Sorry for late response, unfortunately my M3+A9 SoC can not be published
> (yet).
> But I am working on it.
>>>>>>>
>>>>>>> and the OS binary image needs
>>>>>>> to be placed in memory at model startup.
>>>>>>>
>>>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>>>> pure SMP bootloader than assumes total control.
>>>>>
>>>>> the problem here was to me that when we launch QEMU a binary needs to
>>>>> be
>>>>> provided and put in memory
>>>>> in order to be executed. In this patch series the slave doesn't have a
>>>>> proper memory allocated when first launched.
>>>>
>>>> But it could though couldn't it? Can't the slave guest just have full
>>>> access to it's own address space (probably very similar to the masters
>>>> address space) from machine init time? This seems more realistic than
>>>> setting up the hardware based on guest level information.
>>>
>>>
>>> Actually the address space for a slave is built at init time, the thing
>>> that
>>> is not
>>> completely configured is the memory region modeling the RAM. Such region
>>> is
>>> configured
>>> in terms of size, but there is no pointer to the actual memory. The
>>> pointer
>>> is mmap-ed later
>>> before the slave boots.
>>>
>> based on what information? Is the master guest controlling this? If so
>> what is the real-hardware analogue for this concept where the address
>> map of the slave can change (i.e. be configured) at runtime?
>
> I am not sure if it is the case since I haven't emulated this yet (and it
> has very low priority),
> but I might have a real case in my M3+A9 - M3 has 256MiB window that can be
> moved over the 1GiB system memory at runtime.
>

Right, the main thing for me there, is it works at runtime. The master
in Christians scheme would be able to remap it by configuring
registers. The same runtime reconfigurability should apply to the
slave memory mapping. Rather than having to declare the slave memory
map pre machine init, it should just be runtime configured by such a
device. Then there is no need for inter-qemu communication of
machine-init data. The two machines are inited independently, with the
communication channel only being used for runtime data. If there is
hardware support for it, remapping another processors address space is
a valid runtime operation for which we have support for at core
layers.

Regards,
Peter

>

>>>
>>>
> Regards,
> Marcin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-25 21:38         ` Peter Crosthwaite
  2015-10-26 17:12           ` mar.krzeminski
@ 2015-10-27 10:30           ` Christian Pinto
  2015-11-13  7:02             ` Peter Crosthwaite
  1 sibling, 1 reply; 19+ messages in thread
From: Christian Pinto @ 2015-10-27 10:30 UTC (permalink / raw)
  To: Peter Crosthwaite
  Cc: Edgar Iglesias, Peter Maydell, mst, Claudio.Fontana,
	qemu-devel@nongnu.org Developers, Jani.Kokkonen, tech,
	mar.krzeminski

[-- Attachment #1: Type: text/plain, Size: 11761 bytes --]



On 25/10/2015 22:38, Peter Crosthwaite wrote:
> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
> <c.pinto@virtualopensystems.com>  wrote:
>> Hello Peter,
>>
>>
>> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>>> <c.pinto@virtualopensystems.com>  wrote:
>>>> Hello Peter,
>>>>
>>>> thanks for your comments
>>>>
>>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>>> <c.pinto@virtualopensystems.com>   wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>>> architectural elements to model the architecture presented in a
>>>>>> previous
>>>>>> RFC
>>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>>>>>> and the OS binary image needs
>>>>>> to be placed in memory at model startup.
>>>>>>
>>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>>> pure SMP bootloader than assumes total control.
>>>> the problem here was to me that when we launch QEMU a binary needs to be
>>>> provided and put in memory
>>>> in order to be executed. In this patch series the slave doesn't have a
>>>> proper memory allocated when first launched.
>>> But it could though couldn't it? Can't the slave guest just have full
>>> access to it's own address space (probably very similar to the masters
>>> address space) from machine init time? This seems more realistic than
>>> setting up the hardware based on guest level information.
>> Actually the address space for a slave is built at init time, the thing that
>> is not
>> completely configured is the memory region modeling the RAM. Such region is
>> configured
>> in terms of size, but there is no pointer to the actual memory. The pointer
>> is mmap-ed later
>> before the slave boots.
>>
> based on what information? Is the master guest controlling this? If so
> what is the real-hardware analogue for this concept where the address
> map of the slave can change (i.e. be configured) at runtime?
Hello Peter,

The memory map of a slave is not controlled by the master guest, since it is
dependent from the machine model used for the slave. The only thing the 
master
controls is the subset of the main memory that is assigned to a slave.  By
saying that the memory pointer is sent to the slave later, before the 
boot, it is like setting the
boot address for that specific slave within the whole platform memory. So
essentially the offset passed for the mmap is from beginning of master 
memory up to the
beginning of the memory carved out for the specific slave. I see this as 
a way to
protect the master memory from  malicious accesses from the slave side, 
so this
way the slave will only "see" the part of the memory that it got assigned.

>>>> The information about memory (fd + offset for mmap) is sent only later
>>>> when
>>>> the boot is triggered. This is also
>>>> safe since the slave will be waiting in the incoming state, and thus no
>>>> corruption or errors can happen before the
>>>> boot is triggered.
>>> I was thinking more about your comment about slave-to-slave
>>> interrupts. This would just trivially be a local software-generated
>>> interrupts of some form within the slave cluster.
>> Sorry, I did not catch your comment at first time. You are right, if cores
>> are in the same cluster
>> a software generated interrupt is going to be enough. Of course the eventfd
>> based interrupts
>> make sense for a remote QEMU.
>>
> Is eventfd a better implementation of remote-port GPIOs as in the Xilinx work?

Functionally I think they provide the same behavior. We went for eventfd 
since
when designing the code of the IDM we based it on what available on 
upstream QEMU
to signal events between processes (e.g., eventfd).

> Re the terminology, I don't like the idea of thinking of inter-qemu
> "interrupts" as whatever system we decide on should be able to support
> arbitrary signals going from one QEMU to another. I think the Xilinx
> work already has reset signals going between the QEMU peers.

We used the inter-qemu interrupt term, since such signal was triggered 
from the IDM
and is an interrupt. But I see your point and agree that such interrupt 
could be a generic
inter-qemu signaling mechanism, that can be used as interrupt for this 
specific purpose.

>>>>>> The multi client-socket is used for the master to trigger
>>>>>>          the boot of a slave, and also for each master-slave couple to
>>>>>> exchancge the
>>>>>>          eventd file descriptors. The IDM device can be instantiated
>>>>>> either
>>>>>> as a
>>>>>>          PCI or sysbus device.
>>>>>>
>>>>> So if everything is is one QEMU, IPIs can be implemented with just a
>>>> of registers makes the master in
>>>> "control" each of the slaves. The IDM device is already seen as a regular
>>>> device by each of the QEMU instances
>>>> involved.
>>>>
>>> I'm starting to think this series is two things that should be
>>> decoupled. One is the abstract device(s) to facilitate your AMP, the
>>> other is the inter-qemu communication. For the abstract device, I
>>> guess this would be a new virtio-idm device. We should try and involve
>>> virtio people perhaps. I can see the value in it quite separate from
>>> modelling the real sysctrl hardware.
>> Interesting, which other value/usage do you see in it? For me the IDM was
>> meant to
> It has value in prototyping with your abstract toolkit even with
> homogeneous hardware. E.g. I should be able to just use single-QEMU
> ARM virt machine -smp 2 and create one of these virtio-AMP setups.
> Homogeneous hardware with heterogenous software using your new pieces
> of abstract hardware.
>
> It is also more practical for getting a merge of your work as you are
> targetting two different audiences with the work. People intersted in
> virtio can handle the new devices you create, while the core
> maintainers can handle your multi-QEMU work. It is two rather big new
> features.

This is true, too much meat on the fire for the same patch makes it
difficult to get merged. Thanks.
We could split in multi-client socket work, the inter-qemu
communication and virtio-idm.

>
>> work as an abstract system controller to centralize the management
>> of the slaves (boot_regs and interrupts).
>>
>>
>>> But I think the implementation
>>> should be free of any inter-QEMU awareness. E.g. from P4 of this
>>> series:
>>>
>>> +static void send_shmem_fd(IDMState *s, MSClient *c)
>>> +{
>>> +    int fd, len;
>>> +    uint32_t *message;
>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
>>> +
>>> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
>>> +    message = malloc(len * sizeof(uint32_t));
>>> +    strcpy((char *) message, SEND_MEM_FD_CMD);
>>> +    message[len - 2] = s->pboot_size;
>>> +    message[len - 1] = s->pboot_offset;
>>> +
>>> +    fd = memory_region_get_fd(&backend->mr);
>>> +
>>> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
>>> sizeof(uint32_t));
>>>
>>> The device itself is aware of shared-memory and multi-sockets. Using
>>> the device for single-QEMU AMP would require neither - can the IDM
>>> device be used in a homogeneous AMP flow in one of our existing SMP
>>> machine models (eg on a dual core A9 with one core being master and
>>> the other slave)?
>>>
>>> Can this be architected in two phases for greater utility, with the
>>> AMP devices as just normal devices, and the inter-qemu communication
>>> as a separate feature?
>> I see your point, and it is an interesting proposal.
>>
>> What I can think here to remove the awareness of how the IDM communicates
>> with
>> the slaves, is to define a kind of AMP Slave interface. So there will be an
>> instance of the interface for each of the slaves, encapsulating the
>> communication part (being either local or based on sockets).
>> The AMP Slave interfaces would be what you called the AMP devices, with one
>> device per slave.
>>
> Do we need this hard definition of master and slave in the hardware?
> Can the virtio-device be more peer-peer and the master-slave
> relationship is purely implemented by the guest?

I think we can architect it in a way that the virtio-idm simply connects
two or more peers, and depending from the usage done by the
software, behaving as master from one side and slave on the other.
I used the term slave AMP interface, I should have used AMP client
interface, to indicate the cores/procesors the IDM has inter-connect
(being local or on another QEMU instance).
So there would be an implementation of the AMP client interface that
is based on the assumption that all the processors are on the same
instance, and one based on sockets for the remote instances.

to make an example, for a single qemu instance with -smp 2
you would add something like :

-smp 2
-device amp-local-client, core_id=0, id=client0
-device amp-local-client, core_id=1, id=client1
-device virtio-idm, clients=2, id=idm

while for remote qemu instances something like
(the opposite to be instantiated on the other remote instance):

-device amp-local-client, id=client0
-device amp-remote-client, chardev=chdev_id, id=client1
-device virtio-idm, clients=2, id=idm-dev

This way the idm only knows about clients (all clients are the
same for the IDM). The software running on the processors
will enable the interaction between the clients by writing
into the IDM device registers.

At a first glance, and according to my current proposal, I see
such AMP client interfaces exporting the following methods:

  * raise_interrupt() function: called by the IDM to trigger an
    interrupt towards the destination client

  * boot_trigger() function: called by the IDM to trigger the boot of
    the client

If the clients are remote, socket communication will be used and hidden 
in the AMP client interface implementation


Do you foresee a different type of interface for the use-case
you have in mind? I ask because if for example the clients are
cores of the same cluster (and same instance), interrupts could
simply be software generated from the linux-kernel/firmware
running on top of the processors and theoretically no need to
go through the IDM, same I guess for the boot.

Another thing that needs to be defined clearly is the interface between
the IDM and the software running on the cores.
At the moment I am using a set of registers, namely the boot and
the interrupt registers. By writing the ID of a client in such registers
it is possible to forward an interrupt or trigger its boot.


Thanks,

Christian


>
> Regards,
> Peter
>
>> At master side, besides the IDM, one would instantiate
>> as many interface devices as slaves. During the initialization the IDM will
>> link
>> with all those interfaces, and only call functions like: send_interrupt() or
>> boot_slave() to interact with the slaves. The interface will be the same for
>> both local or remote slaves, while the implementation of the methods will
>> differ and reside in the specific AMP Slave Interface device.
>> On the slave side, if the slave is remote, another instance of the
>> interface is instantiated so to connect to socket/eventfd.
>>
>> So as an example the send_shmem_fd function you pointed could be hidden in
>> the
>> slave interface, and invoked only when the IDM will invoke the slave_boot()
>> function of a remote slave interface.
>>
>> This would higher the level of abstraction and open the door to potentially
>> any
>> communication mechanism between master and slave, without the need to adapt
>> the
>> IDM device to the specific case. Or, eventually, to mix between local and
>> remote instances.
>>
>>
>> Thanks,
>>
>> Christian
>>
>>> Regards,
>>> Peter


[-- Attachment #2: Type: text/html, Size: 15686 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-10-27 10:30           ` Christian Pinto
@ 2015-11-13  7:02             ` Peter Crosthwaite
  2015-12-12 10:19               ` Christian Pinto
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Crosthwaite @ 2015-11-13  7:02 UTC (permalink / raw)
  To: Christian Pinto
  Cc: Edgar Iglesias, Peter Maydell, Michael S. Tsirkin,
	Claudio.Fontana, qemu-devel@nongnu.org Developers, Jani.Kokkonen,
	tech, mar.krzeminski

[-- Attachment #1: Type: text/plain, Size: 12444 bytes --]

Hi Christian,

Sorry about the delayed response.

On Tue, Oct 27, 2015 at 3:30 AM, Christian Pinto <
c.pinto@virtualopensystems.com> wrote:

>
>
> On 25/10/2015 22:38, Peter Crosthwaite wrote:
>
> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto<c.pinto@virtualopensystems.com> <c.pinto@virtualopensystems.com> wrote:
>
> Hello Peter,
>
>
> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>
> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto<c.pinto@virtualopensystems.com> <c.pinto@virtualopensystems.com> wrote:
>
> Hello Peter,
>
> thanks for your comments
>
> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>
> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto<c.pinto@virtualopensystems.com> <c.pinto@virtualopensystems.com>  wrote:
>
> Hi all,
>
> This RFC patch-series introduces the set of changes enabling the
> architectural elements to model the architecture presented in a
> previous
> RFC
> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>
> and the OS binary image needs
> to be placed in memory at model startup.
>
>
> I don't see what this limitation is exactly. Can you explain more? I
> do see a need to work on the ARM bootloader for AMP flows, it is a
> pure SMP bootloader than assumes total control.
>
> the problem here was to me that when we launch QEMU a binary needs to be
> provided and put in memory
> in order to be executed. In this patch series the slave doesn't have a
> proper memory allocated when first launched.
>
> But it could though couldn't it? Can't the slave guest just have full
> access to it's own address space (probably very similar to the masters
> address space) from machine init time? This seems more realistic than
> setting up the hardware based on guest level information.
>
> Actually the address space for a slave is built at init time, the thing that
> is not
> completely configured is the memory region modeling the RAM. Such region is
> configured
> in terms of size, but there is no pointer to the actual memory. The pointer
> is mmap-ed later
> before the slave boots.
>
>
> based on what information? Is the master guest controlling this? If so
> what is the real-hardware analogue for this concept where the address
> map of the slave can change (i.e. be configured) at runtime?
>
> Hello Peter,
>
> The memory map of a slave is not controlled by the master guest, since it
> is
> dependent from the machine model used for the slave. The only thing the
> master
> controls is the subset of the main memory that is assigned to a slave.  By
> saying that the memory pointer is sent to the slave later, before the
> boot, it is like setting the
> boot address for that specific slave within the whole platform memory. So
> essentially the offset passed for the mmap is from beginning of master
> memory up to the
> beginning of the memory carved out for the specific slave. I see this as a
> way to
> protect the master memory from  malicious accesses from the slave side, so
> this
> way the slave will only "see" the part of the memory that it got assigned.
>
>
That does sound like memory map control though. Is it simpler to just give
the slave full access and implement such protections as a specific feature
(probably some sort of IOMMU)?


> The information about memory (fd + offset for mmap) is sent only later
> when
> the boot is triggered. This is also
> safe since the slave will be waiting in the incoming state, and thus no
> corruption or errors can happen before the
> boot is triggered.
>
> I was thinking more about your comment about slave-to-slave
> interrupts. This would just trivially be a local software-generated
> interrupts of some form within the slave cluster.
>
> Sorry, I did not catch your comment at first time. You are right, if cores
> are in the same cluster
> a software generated interrupt is going to be enough. Of course the eventfd
> based interrupts
> make sense for a remote QEMU.
>
>
> Is eventfd a better implementation of remote-port GPIOs as in the Xilinx work?
>
>
> Functionally I think they provide the same behavior. We went for eventfd
> since
> when designing the code of the IDM we based it on what available on
> upstream QEMU
> to signal events between processes (e.g., eventfd).
>
> Re the terminology, I don't like the idea of thinking of inter-qemu
> "interrupts" as whatever system we decide on should be able to support
> arbitrary signals going from one QEMU to another. I think the Xilinx
> work already has reset signals going between the QEMU peers.
>
>
> We used the inter-qemu interrupt term, since such signal was triggered
> from the IDM
> and is an interrupt. But I see your point and agree that such interrupt
> could be a generic
> inter-qemu signaling mechanism, that can be used as interrupt for this
> specific purpose.
>
>
> The multi client-socket is used for the master to trigger
>         the boot of a slave, and also for each master-slave couple to
> exchancge the
>         eventd file descriptors. The IDM device can be instantiated
> either
> as a
>         PCI or sysbus device.
>
>
> So if everything is is one QEMU, IPIs can be implemented with just a
>
> of registers makes the master in
> "control" each of the slaves. The IDM device is already seen as a regular
> device by each of the QEMU instances
> involved.
>
>
> I'm starting to think this series is two things that should be
> decoupled. One is the abstract device(s) to facilitate your AMP, the
> other is the inter-qemu communication. For the abstract device, I
> guess this would be a new virtio-idm device. We should try and involve
> virtio people perhaps. I can see the value in it quite separate from
> modelling the real sysctrl hardware.
>
> Interesting, which other value/usage do you see in it? For me the IDM was
> meant to
>
> It has value in prototyping with your abstract toolkit even with
> homogeneous hardware. E.g. I should be able to just use single-QEMU
> ARM virt machine -smp 2 and create one of these virtio-AMP setups.
> Homogeneous hardware with heterogenous software using your new pieces
> of abstract hardware.
>
> It is also more practical for getting a merge of your work as you are
> targetting two different audiences with the work. People intersted in
> virtio can handle the new devices you create, while the core
> maintainers can handle your multi-QEMU work. It is two rather big new
> features.
>
>
> This is true, too much meat on the fire for the same patch makes it
> difficult to get merged. Thanks.
> We could split in multi-client socket work, the inter-qemu
> communication and virtio-idm.
>
>
OK.


>
> work as an abstract system controller to centralize the management
> of the slaves (boot_regs and interrupts).
>
>
>
> But I think the implementation
> should be free of any inter-QEMU awareness. E.g. from P4 of this
> series:
>
> +static void send_shmem_fd(IDMState *s, MSClient *c)
> +{
> +    int fd, len;
> +    uint32_t *message;
> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
> +
> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
> +    message = malloc(len * sizeof(uint32_t));
> +    strcpy((char *) message, SEND_MEM_FD_CMD);
> +    message[len - 2] = s->pboot_size;
> +    message[len - 1] = s->pboot_offset;
> +
> +    fd = memory_region_get_fd(&backend->mr);
> +
> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
> sizeof(uint32_t));
>
> The device itself is aware of shared-memory and multi-sockets. Using
> the device for single-QEMU AMP would require neither - can the IDM
> device be used in a homogeneous AMP flow in one of our existing SMP
> machine models (eg on a dual core A9 with one core being master and
> the other slave)?
>
> Can this be architected in two phases for greater utility, with the
> AMP devices as just normal devices, and the inter-qemu communication
> as a separate feature?
>
> I see your point, and it is an interesting proposal.
>
> What I can think here to remove the awareness of how the IDM communicates
> with
> the slaves, is to define a kind of AMP Slave interface. So there will be an
> instance of the interface for each of the slaves, encapsulating the
> communication part (being either local or based on sockets).
> The AMP Slave interfaces would be what you called the AMP devices, with one
> device per slave.
>
>
> Do we need this hard definition of master and slave in the hardware?
> Can the virtio-device be more peer-peer and the master-slave
> relationship is purely implemented by the guest?
>
>
> I think we can architect it in a way that the virtio-idm simply connects
> two or more peers, and depending from the usage done by the
> software, behaving as master from one side and slave on the other.
> I used the term slave AMP interface, I should have used AMP client
> interface, to indicate the cores/procesors the IDM has inter-connect
> (being local or on another QEMU instance).
> So there would be an implementation of the AMP client interface that
> is based on the assumption that all the processors are on the same
> instance, and one based on sockets for the remote instances.
>
>
Do you need this dual mode? Can the IDM just have GPIOs which are then
either directly connected to the local CPUs, or sent out over inter-qemu
connectivity mechanism? Then the inter-qemu can be used for any GPIO
communication.


> to make an example, for a single qemu instance with -smp 2
> you would add something like :
>
> -smp 2
> -device amp-local-client, core_id=0, id=client0
> -device amp-local-client, core_id=1, id=client1
> -device virtio-idm, clients=2, id=idm
>
> while for remote qemu instances something like
> (the opposite to be instantiated on the other remote instance):
>
> -device amp-local-client, id=client0
> -device amp-remote-client, chardev=chdev_id, id=client1
> -device virtio-idm, clients=2, id=idm-dev
>
> This way the idm only knows about clients (all clients are the
> same for the IDM). The software running on the processors
> will enable the interaction between the clients by writing
> into the IDM device registers.
>
> At a first glance, and according to my current proposal, I see
> such AMP client interfaces exporting the following methods:
>
>    - raise_interrupt() function: called by the IDM to trigger an
>    interrupt towards the destination client
>
>
>    - boot_trigger() function: called by the IDM to trigger the boot of
>    the client
>
> If the clients are remote, socket communication will be used and hidden in
> the AMP client interface implementation
>
>
> Do you foresee a different type of interface for the use-case
> you have in mind? I ask because if for example the clients are
> cores of the same cluster (and same instance), interrupts could
> simply be software generated from the linux-kernel/firmware
> running on top of the processors and theoretically no need to
> go through the IDM, same I guess for the boot.
>
True. But if you are developing code for IDM, you can do a
crawl-before-walk test with an SMP test case.

Regards,
Peter

> Another thing that needs to be defined clearly is the interface between
> the IDM and the software running on the cores.
> At the moment I am using a set of registers, namely the boot and
> the interrupt registers. By writing the ID of a client in such registers
> it is possible to forward an interrupt or trigger its boot.
>
>
> Thanks,
>
> Christian
>
>
>
> Regards,
> Peter
>
>
> At master side, besides the IDM, one would instantiate
> as many interface devices as slaves. During the initialization the IDM will
> link
> with all those interfaces, and only call functions like: send_interrupt() or
> boot_slave() to interact with the slaves. The interface will be the same for
> both local or remote slaves, while the implementation of the methods will
> differ and reside in the specific AMP Slave Interface device.
> On the slave side, if the slave is remote, another instance of the
> interface is instantiated so to connect to socket/eventfd.
>
> So as an example the send_shmem_fd function you pointed could be hidden in
> the
> slave interface, and invoked only when the IDM will invoke the slave_boot()
> function of a remote slave interface.
>
> This would higher the level of abstraction and open the door to potentially
> any
> communication mechanism between master and slave, without the need to adapt
> the
> IDM device to the specific case. Or, eventually, to mix between local and
> remote instances.
>
>
> Thanks,
>
> Christian
>
>
> Regards,
> Peter
>
>
>

[-- Attachment #2: Type: text/html, Size: 17288 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
  2015-11-13  7:02             ` Peter Crosthwaite
@ 2015-12-12 10:19               ` Christian Pinto
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Pinto @ 2015-12-12 10:19 UTC (permalink / raw)
  To: Peter Crosthwaite
  Cc: Edgar Iglesias, Peter Maydell, Michael S. Tsirkin,
	Claudio Fontana, qemu-devel@nongnu.org Developers, Jani Kokkonen,
	VirtualOpenSystems Technical Team, mar. krzeminski

[-- Attachment #1: Type: text/plain, Size: 14113 bytes --]

Hello Peter,

Apologies for the highly delayed response.

On Nov 13, 2015 08:02, "Peter Crosthwaite" <crosthwaitepeter@gmail.com>
wrote:
>
> Hi Christian,
>
> Sorry about the delayed response.
>
> On Tue, Oct 27, 2015 at 3:30 AM, Christian Pinto <
c.pinto@virtualopensystems.com> wrote:
>>
>>
>>
>> On 25/10/2015 22:38, Peter Crosthwaite wrote:
>>>
>>> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
>>> <c.pinto@virtualopensystems.com> wrote:
>>>>
>>>> Hello Peter,
>>>>
>>>>
>>>> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>>>>
>>>>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>>>>> <c.pinto@virtualopensystems.com> wrote:
>>>>>>
>>>>>> Hello Peter,
>>>>>>
>>>>>> thanks for your comments
>>>>>>
>>>>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>>>>
>>>>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>>>>> <c.pinto@virtualopensystems.com>  wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>>>>> architectural elements to model the architecture presented in a
>>>>>>>> previous
>>>>>>>> RFC
>>>>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>>>>>>>>
>>>>>>>> and the OS binary image needs
>>>>>>>> to be placed in memory at model startup.
>>>>>>>>
>>>>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>>>>> pure SMP bootloader than assumes total control.
>>>>>>
>>>>>> the problem here was to me that when we launch QEMU a binary needs
to be
>>>>>> provided and put in memory
>>>>>> in order to be executed. In this patch series the slave doesn't have
a
>>>>>> proper memory allocated when first launched.
>>>>>
>>>>> But it could though couldn't it? Can't the slave guest just have full
>>>>> access to it's own address space (probably very similar to the masters
>>>>> address space) from machine init time? This seems more realistic than
>>>>> setting up the hardware based on guest level information.
>>>>
>>>> Actually the address space for a slave is built at init time, the
thing that
>>>> is not
>>>> completely configured is the memory region modeling the RAM. Such
region is
>>>> configured
>>>> in terms of size, but there is no pointer to the actual memory. The
pointer
>>>> is mmap-ed later
>>>> before the slave boots.
>>>>
>>> based on what information? Is the master guest controlling this? If so
>>> what is the real-hardware analogue for this concept where the address
>>> map of the slave can change (i.e. be configured) at runtime?
>>
>> Hello Peter,
>>
>> The memory map of a slave is not controlled by the master guest, since
it is
>> dependent from the machine model used for the slave. The only thing the
master
>> controls is the subset of the main memory that is assigned to a slave.
By
>> saying that the memory pointer is sent to the slave later, before the
boot, it is like setting the
>> boot address for that specific slave within the whole platform memory. So
>> essentially the offset passed for the mmap is from beginning of master
memory up to the
>> beginning of the memory carved out for the specific slave. I see this as
a way to
>> protect the master memory from  malicious accesses from the slave side,
so this
>> way the slave will only "see" the part of the memory that it got
assigned.
>>
>
> That does sound like memory map control though. Is it simpler to just
give the slave full access and implement such protections as a specific
feature (probably some sort of IOMMU)?
>

Yes it is a kind of memory map control. An IOMMU like component would do
the job, but as said already most of the focus of this project was on the
IDM and the inter-QEMU communication. We could consider the implementation
of an IOMMU like component in future extensions.

>>>>>>
>>>>>> The information about memory (fd + offset for mmap) is sent only
later
>>>>>> when
>>>>>> the boot is triggered. This is also
>>>>>> safe since the slave will be waiting in the incoming state, and thus
no
>>>>>> corruption or errors can happen before the
>>>>>> boot is triggered.
>>>>>
>>>>> I was thinking more about your comment about slave-to-slave
>>>>> interrupts. This would just trivially be a local software-generated
>>>>> interrupts of some form within the slave cluster.
>>>>
>>>> Sorry, I did not catch your comment at first time. You are right, if
cores
>>>> are in the same cluster
>>>> a software generated interrupt is going to be enough. Of course the
eventfd
>>>> based interrupts
>>>> make sense for a remote QEMU.
>>>>
>>> Is eventfd a better implementation of remote-port GPIOs as in the
Xilinx work?
>>
>>
>> Functionally I think they provide the same behavior. We went for eventfd
since
>> when designing the code of the IDM we based it on what available on
upstream QEMU
>> to signal events between processes (e.g., eventfd).
>>
>>> Re the terminology, I don't like the idea of thinking of inter-qemu
>>> "interrupts" as whatever system we decide on should be able to support
>>> arbitrary signals going from one QEMU to another. I think the Xilinx
>>> work already has reset signals going between the QEMU peers.
>>
>>
>> We used the inter-qemu interrupt term, since such signal was triggered
from the IDM
>> and is an interrupt. But I see your point and agree that such interrupt
could be a generic
>> inter-qemu signaling mechanism, that can be used as interrupt for this
specific purpose.
>>
>>
>>>>>>>> The multi client-socket is used for the master to trigger
>>>>>>>>         the boot of a slave, and also for each master-slave couple
to
>>>>>>>> exchancge the
>>>>>>>>         eventd file descriptors. The IDM device can be instantiated
>>>>>>>> either
>>>>>>>> as a
>>>>>>>>         PCI or sysbus device.
>>>>>>>>
>>>>>>> So if everything is is one QEMU, IPIs can be implemented with just a
>>>>>>
>>>>>> of registers makes the master in
>>>>>> "control" each of the slaves. The IDM device is already seen as a
regular
>>>>>> device by each of the QEMU instances
>>>>>> involved.
>>>>>>
>>>>> I'm starting to think this series is two things that should be
>>>>> decoupled. One is the abstract device(s) to facilitate your AMP, the
>>>>> other is the inter-qemu communication. For the abstract device, I
>>>>> guess this would be a new virtio-idm device. We should try and involve
>>>>> virtio people perhaps. I can see the value in it quite separate from
>>>>> modelling the real sysctrl hardware.
>>>>
>>>> Interesting, which other value/usage do you see in it? For me the IDM
was
>>>> meant to
>>>
>>> It has value in prototyping with your abstract toolkit even with
>>> homogeneous hardware. E.g. I should be able to just use single-QEMU
>>> ARM virt machine -smp 2 and create one of these virtio-AMP setups.
>>> Homogeneous hardware with heterogenous software using your new pieces
>>> of abstract hardware.
>>>
>>> It is also more practical for getting a merge of your work as you are
>>> targetting two different audiences with the work. People intersted in
>>> virtio can handle the new devices you create, while the core
>>> maintainers can handle your multi-QEMU work. It is two rather big new
>>> features.
>>
>>
>> This is true, too much meat on the fire for the same patch makes it
>> difficult to get merged. Thanks.
>> We could split in multi-client socket work, the inter-qemu
>> communication and virtio-idm.
>>
>
> OK.
>
>>
>>
>>>> work as an abstract system controller to centralize the management
>>>> of the slaves (boot_regs and interrupts).
>>>>
>>>>
>>>>> But I think the implementation
>>>>> should be free of any inter-QEMU awareness. E.g. from P4 of this
>>>>> series:
>>>>>
>>>>> +static void send_shmem_fd(IDMState *s, MSClient *c)
>>>>> +{
>>>>> +    int fd, len;
>>>>> +    uint32_t *message;
>>>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
>>>>> +
>>>>> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
>>>>> +    message = malloc(len * sizeof(uint32_t));
>>>>> +    strcpy((char *) message, SEND_MEM_FD_CMD);
>>>>> +    message[len - 2] = s->pboot_size;
>>>>> +    message[len - 1] = s->pboot_offset;
>>>>> +
>>>>> +    fd = memory_region_get_fd(&backend->mr);
>>>>> +
>>>>> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
>>>>> sizeof(uint32_t));
>>>>>
>>>>> The device itself is aware of shared-memory and multi-sockets. Using
>>>>> the device for single-QEMU AMP would require neither - can the IDM
>>>>> device be used in a homogeneous AMP flow in one of our existing SMP
>>>>> machine models (eg on a dual core A9 with one core being master and
>>>>> the other slave)?
>>>>>
>>>>> Can this be architected in two phases for greater utility, with the
>>>>> AMP devices as just normal devices, and the inter-qemu communication
>>>>> as a separate feature?
>>>>
>>>> I see your point, and it is an interesting proposal.
>>>>
>>>> What I can think here to remove the awareness of how the IDM
communicates
>>>> with
>>>> the slaves, is to define a kind of AMP Slave interface. So there will
be an
>>>> instance of the interface for each of the slaves, encapsulating the
>>>> communication part (being either local or based on sockets).
>>>> The AMP Slave interfaces would be what you called the AMP devices,
with one
>>>> device per slave.
>>>>
>>> Do we need this hard definition of master and slave in the hardware?
>>> Can the virtio-device be more peer-peer and the master-slave
>>> relationship is purely implemented by the guest?
>>
>>
>> I think we can architect it in a way that the virtio-idm simply connects
>> two or more peers, and depending from the usage done by the
>> software, behaving as master from one side and slave on the other.
>> I used the term slave AMP interface, I should have used AMP client
>> interface, to indicate the cores/procesors the IDM has inter-connect
>> (being local or on another QEMU instance).
>> So there would be an implementation of the AMP client interface that
>> is based on the assumption that all the processors are on the same
>> instance, and one based on sockets for the remote instances.
>>
>
> Do you need this dual mode? Can the IDM just have GPIOs which are then
either directly connected to the local CPUs, or sent out over inter-qemu
connectivity mechanism? Then the inter-qemu can be used for any GPIO
communication.
>

Yes, at a first glance it could be done as you propose. The only doubt I
have is related to the boot triggering feature, that is used by master OSes
and rely on the IDM,  and might need to have  a specific interface when
slave is remote (so the dual mode).

I guess the best is now to continue working on the code to produce a new
patchset, taking your suggestions into account when designing the new
interface in order to abstract the inter-QEMU communication part as much as
possible from the IDM code.

Thanks,

Christian

>>
>> to make an example, for a single qemu instance with -smp 2
>> you would add something like :
>>
>> -smp 2
>> -device amp-local-client, core_id=0, id=client0
>> -device amp-local-client, core_id=1, id=client1
>> -device virtio-idm, clients=2, id=idm
>>
>> while for remote qemu instances something like
>> (the opposite to be instantiated on the other remote instance):
>>
>> -device amp-local-client, id=client0
>> -device amp-remote-client, chardev=chdev_id, id=client1
>> -device virtio-idm, clients=2, id=idm-dev
>>
>> This way the idm only knows about clients (all clients are the
>> same for the IDM). The software running on the processors
>> will enable the interaction between the clients by writing
>> into the IDM device registers.
>>
>> At a first glance, and according to my current proposal, I see
>> such AMP client interfaces exporting the following methods:
>> raise_interrupt() function: called by the IDM to trigger an interrupt
towards the destination client
>> boot_trigger() function: called by the IDM to trigger the boot of the
client
>>
>> If the clients are remote, socket communication will be used and hidden
in the AMP client interface implementation
>>
>>
>> Do you foresee a different type of interface for the use-case
>> you have in mind? I ask because if for example the clients are
>> cores of the same cluster (and same instance), interrupts could
>> simply be software generated from the linux-kernel/firmware
>> running on top of the processors and theoretically no need to
>> go through the IDM, same I guess for the boot.
>
> True. But if you are developing code for IDM, you can do a
crawl-before-walk test with an SMP test case.
>
> Regards,
> Peter
>>
>> Another thing that needs to be defined clearly is the interface between
>> the IDM and the software running on the cores.
>> At the moment I am using a set of registers, namely the boot and
>> the interrupt registers. By writing the ID of a client in such registers
>> it is possible to forward an interrupt or trigger its boot.
>>
>>
>> Thanks,
>>
>> Christian
>>
>>
>>
>>> Regards,
>>> Peter
>>>
>>>> At master side, besides the IDM, one would instantiate
>>>> as many interface devices as slaves. During the initialization the IDM
will
>>>> link
>>>> with all those interfaces, and only call functions like:
send_interrupt() or
>>>> boot_slave() to interact with the slaves. The interface will be the
same for
>>>> both local or remote slaves, while the implementation of the methods
will
>>>> differ and reside in the specific AMP Slave Interface device.
>>>> On the slave side, if the slave is remote, another instance of the
>>>> interface is instantiated so to connect to socket/eventfd.
>>>>
>>>> So as an example the send_shmem_fd function you pointed could be
hidden in
>>>> the
>>>> slave interface, and invoked only when the IDM will invoke the
slave_boot()
>>>> function of a remote slave interface.
>>>>
>>>> This would higher the level of abstraction and open the door to
potentially
>>>> any
>>>> communication mechanism between master and slave, without the need to
adapt
>>>> the
>>>> IDM device to the specific case. Or, eventually, to mix between local
and
>>>> remote instances.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Christian
>>>>
>>>>> Regards,
>>>>> Peter
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 19879 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-12-12 10:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 2/8] backend: shared memory backend Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 3/8] migration: add shared migration type Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 4/8] hw/misc: IDM Device Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 5/8] hw/arm: sysbus-fdt Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 6/8] qemu: slave machine flag Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 7/8] hw/arm: boot Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 8/8] qemu: numa Christian Pinto
2015-10-01 16:26 ` [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Peter Crosthwaite
2015-10-05 15:50   ` Christian Pinto
2015-10-07 15:48     ` Peter Crosthwaite
2015-10-22  9:21       ` Christian Pinto
2015-10-25 21:38         ` Peter Crosthwaite
2015-10-26 17:12           ` mar.krzeminski
2015-10-26 17:42             ` Peter Crosthwaite
2015-10-27 10:30           ` Christian Pinto
2015-11-13  7:02             ` Peter Crosthwaite
2015-12-12 10:19               ` Christian Pinto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).