qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] QEMU patch to allow VM introspection via libvmi
@ 2015-10-15 23:44 valerio
  2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
  2015-10-16  8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
  0 siblings, 2 replies; 43+ messages in thread
From: valerio @ 2015-10-15 23:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, ehabkost, lcapitulino


All-

I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs.

Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch

This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread.

The client writes binary commands to the socket, in the form of a c structure:

struct request {
     uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
     uint64_t address;   // address to read from OR write to
     uint64_t length;    // number of bytes to read OR write
};

The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation.

The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure).
The socket API was written by the libvmi author and it works the with current libvmi version. The libvmi client-side implementation is at:

https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c

As many use kvm VM's for introspection, malware and security analysis, it might be worth thinking about making the pmemaccess a permanent hmp/qmp command, as opposed to having to produce a patch at each QEMU point release.

Also, the pmemsave commands QAPI should be changed to be usable with 64bit VM's

in qapi-schema.json

from

---
{ 'command': 'pmemsave',
  'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
---

to

---
{ 'command': 'pmemsave',
  'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
---

hmp-commands.hx and qmp-commands.hx should be edited accordingly. I did not make the above pmemsave changes part of my patch.

Let me know if you have any questions,

Valerio

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently.
  2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
@ 2015-10-15 23:44 ` valerio
  2015-10-19 21:33   ` Eric Blake
  2015-10-16  8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
  1 sibling, 1 reply; 43+ messages in thread
From: valerio @ 2015-10-15 23:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, ehabkost, Valerio Aimale, lcapitulino

From: Valerio Aimale <valerio@aimale.com>

---
 Makefile.target  |   2 +-
 hmp-commands.hx  |  14 ++++
 hmp.c            |   9 +++
 hmp.h            |   1 +
 memory-access.c  | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory-access.h  |  21 ++++++
 qapi-schema.json |  28 ++++++++
 qmp-commands.hx  |  23 +++++++
 8 files changed, 303 insertions(+), 1 deletion(-)
 create mode 100644 memory-access.c
 create mode 100644 memory-access.h

diff --git a/Makefile.target b/Makefile.target
index 962d004..940ab51 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER
 #########################################################
 # System emulator target
 ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o
 obj-y += qtest.o bootdevice.o
 obj-y += hw/
 obj-$(CONFIG_KVM) += kvm-all.o
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3a4ae39..37d3bb5 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}.
 ETEXI
 
     {
+        .name       = "pmemaccess",
+        .args_type  = "path:s",
+        .params     = "file",
+        .help       = "open A UNIX Socket access to physical memory at 'path'",
+        .mhandler.cmd = hmp_pmemaccess,
+    },
+
+STEXI
+@item pmemaccess @var{file}
+@findex pmemaccess
+Mount guest physical memory image at 'path'
+ETEXI
+
+    {
         .name       = "boot_set",
         .args_type  = "bootdevice:s",
         .params     = "bootdevice",
diff --git a/hmp.c b/hmp.c
index 5048eee..444dbf6 100644
--- a/hmp.c
+++ b/hmp.c
@@ -916,6 +916,15 @@ void hmp_pmemsave(Monitor *mon, const QDict *qdict)
     hmp_handle_error(mon, &err);
 }
 
+void hmp_pmemaccess(Monitor *mon, const QDict *qdict)
+{
+    const char *path = qdict_get_str(qdict, "path");
+    Error *err = NULL;
+
+    qmp_pmemaccess(path, &err);
+    hmp_handle_error(mon, &err);
+}
+
 void hmp_ringbuf_write(Monitor *mon, const QDict *qdict)
 {
     const char *chardev = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 81656c3..8225deb 100644
--- a/hmp.h
+++ b/hmp.h
@@ -47,6 +47,7 @@ void hmp_system_powerdown(Monitor *mon, const QDict *qdict);
 void hmp_cpu(Monitor *mon, const QDict *qdict);
 void hmp_memsave(Monitor *mon, const QDict *qdict);
 void hmp_pmemsave(Monitor *mon, const QDict *qdict);
+void hmp_pmemaccess(Monitor *mon, const QDict *qdict);
 void hmp_ringbuf_write(Monitor *mon, const QDict *qdict);
 void hmp_ringbuf_read(Monitor *mon, const QDict *qdict);
 void hmp_cont(Monitor *mon, const QDict *qdict);
diff --git a/memory-access.c b/memory-access.c
new file mode 100644
index 0000000..58099a8
--- /dev/null
+++ b/memory-access.c
@@ -0,0 +1,206 @@
+/*
+ * Access guest physical memory via a domain socket.
+ *
+ * Copyright (C) 2011 Sandia National Laboratories
+ * Original Author: Bryan D. Payne (bdpayne@acm.org)
+ * 
+ * Refurbished for modern QEMU by Valerio Aimale (valerio@aimale.com), in 2015
+ */
+
+#include "memory-access.h"
+//#include "cpu-all.h"
+#include "qemu-common.h"
+#include "exec/cpu-common.h"
+#include "config.h"
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdint.h>
+
+struct request{
+    uint8_t type;      // 0 quit, 1 read, 2 write, ... rest reserved
+    uint64_t address;  // address to read from OR write to
+    uint64_t length;   // number of bytes to read OR write
+};
+
+static uint64_t
+connection_read_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
+{
+    hwaddr paddr = (hwaddr) user_paddr;
+    hwaddr len = (hwaddr) user_len;
+    void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
+    if (!guestmem){
+        return 0;
+    }
+    memcpy(buf, guestmem, len);
+    cpu_physical_memory_unmap(guestmem, len, 0, len);
+
+    return len;
+}
+
+static uint64_t
+connection_write_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
+{
+    hwaddr paddr = (hwaddr) user_paddr;
+    hwaddr len = (hwaddr) user_len;
+    void *guestmem = cpu_physical_memory_map(paddr, &len, 1);
+    if (!guestmem){
+        return 0;
+    }
+    memcpy(guestmem, buf, len);
+    cpu_physical_memory_unmap(guestmem, len, 0, len);
+
+    return len;
+}
+
+static void
+send_success_ack (int connection_fd)
+{
+    uint8_t success = 1;
+    int nbytes = write(connection_fd, &success, 1);
+    if (1 != nbytes){
+        fprintf(stderr, "Qemu pmemaccess: failed to send success ack\n");
+    }
+}
+
+static void
+send_fail_ack (int connection_fd)
+{
+    uint8_t fail = 0;
+    int nbytes = write(connection_fd, &fail, 1);
+    if (1 != nbytes){
+        fprintf(stderr, "Qemu pmemaccess: failed to send fail ack\n");
+    }
+}
+
+static void
+connection_handler (int connection_fd)
+{
+    int nbytes;
+    struct request req;
+
+    while (1){
+        // client request should match the struct request format
+        nbytes = read(connection_fd, &req, sizeof(struct request));
+        if (nbytes != sizeof(struct request)){
+            // error
+            continue;
+        }
+        else if (req.type == 0){
+            // request to quit, goodbye
+            break;
+        }
+        else if (req.type == 1){
+            // request to read
+            char *buf = malloc(req.length + 1);
+            nbytes = connection_read_memory(req.address, buf, req.length);
+            if (nbytes != req.length){
+                // read failure, return failure message
+                buf[req.length] = 0; // set last byte to 0 for failure
+                nbytes = write(connection_fd, buf, 1);
+            }
+            else{
+                // read success, return bytes
+                buf[req.length] = 1; // set last byte to 1 for success
+                nbytes = write(connection_fd, buf, nbytes + 1);
+            }
+            free(buf);
+        }
+        else if (req.type == 2){
+            // request to write
+            void *write_buf = malloc(req.length);
+            nbytes = read(connection_fd, &write_buf, req.length);
+            if (nbytes != req.length){
+                // failed reading the message to write
+                send_fail_ack(connection_fd);
+            }
+            else{
+                // do the write
+                nbytes = connection_write_memory(req.address, write_buf, req.length);
+                if (nbytes == req.length){
+                    send_success_ack(connection_fd);
+                }
+                else{
+                    send_fail_ack(connection_fd);
+                }
+            }
+            free(write_buf);
+        }
+        else{
+            // unknown command
+            fprintf(stderr, "Qemu pmemaccess: ignoring unknown command (%d)\n", req.type);
+            char *buf = malloc(1);
+            buf[0] = 0;
+            nbytes = write(connection_fd, buf, 1);
+            free(buf);
+        }
+    }
+
+    close(connection_fd);
+}
+
+static void *
+memory_access_thread (void *p)
+{
+    struct sockaddr_un address;
+    int socket_fd, connection_fd;
+    socklen_t address_length;
+    struct pmemaccess_args *pargs = (struct pmemaccess_args *)p;
+
+    socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
+    if (socket_fd < 0){
+	error_setg(pargs->errp, "Qemu pmemaccess: socket failed");
+        goto error_exit;
+    }
+    unlink(pargs->path);
+    address.sun_family = AF_UNIX;
+    address_length = sizeof(address.sun_family) + sprintf(address.sun_path, "%s", (char *) pargs->path);
+
+    if (bind(socket_fd, (struct sockaddr *) &address, address_length) != 0){
+	error_setg(pargs->errp, "Qemu pmemaccess: bind failed");
+        goto error_exit;
+    }
+    if (listen(socket_fd, 0) != 0){
+	error_setg(pargs->errp, "Qemu pmemaccess: listen failed");
+        goto error_exit;
+    }
+
+    connection_fd = accept(socket_fd, (struct sockaddr *) &address, &address_length);
+    connection_handler(connection_fd);
+
+    close(socket_fd);
+    unlink(pargs->path);
+    free(pargs->path);
+    free(pargs);
+error_exit:
+    return NULL;
+}
+
+void
+qmp_pmemaccess (const char *path, Error **errp)
+{
+    pthread_t thread;
+    sigset_t set, oldset;
+    struct pmemaccess_args *pargs;
+
+    // create the args struct
+    pargs = (struct pmemaccess_args *) malloc(sizeof(struct pmemaccess_args));
+    pargs->errp = errp;
+    // create a copy of path that we can safely use
+    pargs->path = malloc(strlen(path) + 1);
+    memcpy(pargs->path, path, strlen(path) + 1);
+	
+    // start the thread
+    sigfillset(&set);
+    pthread_sigmask(SIG_SETMASK, &set, &oldset);
+    pthread_create(&thread, NULL, memory_access_thread, pargs);
+    pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+
+}
diff --git a/memory-access.h b/memory-access.h
new file mode 100644
index 0000000..ff3d2f9
--- /dev/null
+++ b/memory-access.h
@@ -0,0 +1,21 @@
+/*
+ * Mount guest physical memory using FUSE.
+ *
+ * Author: Valerio G. Aimale <valerio@aimale.com>
+ */
+
+#ifndef MEMORY_ACCESS_H
+#define MEMORY_ACCESS_H
+
+#include "qapi-types.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+
+void qmp_pmemaccess (const char *path, Error **errp);
+
+struct pmemaccess_args {
+	char *path;
+	Error **errp;
+};
+
+#endif /* MEMORY_ACCESS_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index a386605..24d4070 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1427,6 +1427,34 @@
   'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
 
 ##
+# @pmemaccess:
+#
+# Open A UNIX Socket access to physical memory
+#
+# @path: the name of the UNIX socket pipe
+#
+# Returns: Nothing on success
+#
+# Since: 2.4.0.1
+#
+# Notes: Derived from previously existing patches. When command
+# succeeds connect to the open socket. Write a binary structure to 
+# the socket as:
+# 
+# struct request {
+#     uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
+#     uint64_t address;   // address to read from OR write to
+#     uint64_t length;    // number of bytes to read OR write
+# };
+# 
+# If it is a read operation, Qemu will return lenght+1 bytes. Read lenght+1
+# bytes. the last byte will be a 1 for success, or a 0 for failure.
+#  
+##
+{ 'command': 'pmemaccess',
+  'data': {'path': 'str'} }
+
+##
 # @cont:
 #
 # Resume guest VCPU execution.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d2ba800..26e4a51 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -472,6 +472,29 @@ Example:
 EQMP
 
     {
+        .name       = "pmemaccess",
+        .args_type  = "path:s",
+        .mhandler.cmd_new = qmp_marshal_input_pmemaccess,
+    },
+
+SQMP
+pmemaccess
+----------
+
+Open A UNIX Socket access to physical memory
+
+Arguments:
+
+- "path": mount point path (json-string)
+
+Example:
+
+-> { "execute": "pmemaccess",
+             "arguments": { "path": "/tmp/guestname" } }
+<- { "return": {} }
+
+EQMP
+    {
         .name       = "inject-nmi",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_inject_nmi,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
  2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
@ 2015-10-16  8:15 ` Markus Armbruster
  2015-10-16 14:30   ` Valerio Aimale
  1 sibling, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-16  8:15 UTC (permalink / raw)
  To: valerio; +Cc: lcapitulino, qemu-devel, ehabkost

valerio@aimale.com writes:

> All-
>
> I've produced a patch for the current QEMU HEAD, for libvmi to
> introspect QEMU/KVM VMs.
>
> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>
> This patch adds a hmp and a qmp command, "pmemaccess". When the
> commands is invoked with a string arguments (a filename), it will open
> a UNIX socket and spawn a listening thread.
>
> The client writes binary commands to the socket, in the form of a c
> structure:
>
> struct request {
>      uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>      uint64_t address;   // address to read from OR write to
>      uint64_t length;    // number of bytes to read OR write
> };
>
> The client receives as a response, either (length+1) bytes, if it is a
> read operation, or 1 byte ifit is a write operation.
>
> The last bytes of a read operation response indicates success (1
> success, 0 failure). The single byte returned for a write operation
> indicates same (1 success, 0 failure).

So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
garbage followed by the "it failed" byte?

> The socket API was written by the libvmi author and it works the with
> current libvmi version. The libvmi client-side implementation is at:
>
> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>
> As many use kvm VM's for introspection, malware and security analysis,
> it might be worth thinking about making the pmemaccess a permanent
> hmp/qmp command, as opposed to having to produce a patch at each QEMU
> point release.

Related existing commands: memsave, pmemsave, dump-guest-memory.

Can you explain why these won't do for your use case?

> Also, the pmemsave commands QAPI should be changed to be usable with
> 64bit VM's
>
> in qapi-schema.json
>
> from
>
> ---
> { 'command': 'pmemsave',
>   'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
> ---
>
> to
>
> ---
> { 'command': 'pmemsave',
>   'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
> ---

In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
confusing.

> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
> did not make the above pmemsave changes part of my patch.
>
> Let me know if you have any questions,
>
> Valerio

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-16  8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
@ 2015-10-16 14:30   ` Valerio Aimale
  2015-10-19  7:52     ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-16 14:30 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: lcapitulino, qemu-devel, ehabkost

On 10/16/15 2:15 AM, Markus Armbruster wrote:
> valerio@aimale.com writes:
>
>> All-
>>
>> I've produced a patch for the current QEMU HEAD, for libvmi to
>> introspect QEMU/KVM VMs.
>>
>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>> commands is invoked with a string arguments (a filename), it will open
>> a UNIX socket and spawn a listening thread.
>>
>> The client writes binary commands to the socket, in the form of a c
>> structure:
>>
>> struct request {
>>       uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>       uint64_t address;   // address to read from OR write to
>>       uint64_t length;    // number of bytes to read OR write
>> };
>>
>> The client receives as a response, either (length+1) bytes, if it is a
>> read operation, or 1 byte ifit is a write operation.
>>
>> The last bytes of a read operation response indicates success (1
>> success, 0 failure). The single byte returned for a write operation
>> indicates same (1 success, 0 failure).
> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
> garbage followed by the "it failed" byte?
Markus, that appear to be the case. However, I did not write the 
communication protocol between libvmi and qemu. I'm assuming that the 
person that wrote the protocol, did not want to bother with over 
complicating things.

https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c

I'm thinking he assumed reads would be small in size and the price of 
reading garbage was less than the price of writing a more complicated 
protocol. I can see his point, confronted with the same problem, I might 
have done the same.

>> The socket API was written by the libvmi author and it works the with
>> current libvmi version. The libvmi client-side implementation is at:
>>
>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>
>> As many use kvm VM's for introspection, malware and security analysis,
>> it might be worth thinking about making the pmemaccess a permanent
>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>> point release.
> Related existing commands: memsave, pmemsave, dump-guest-memory.
>
> Can you explain why these won't do for your use case?
For people who do security analysis there are two use cases, static and 
dynamic analysis. With memsave, pmemsave and dum-guest-memory one can do 
static analysis. I.e. snapshotting a VM and see what was happening at 
that point in time.
Dynamic analysis require to be able to 'introspect' a VM while it's running.

If you take a snapshot of two people exchanging a glass of water, and 
you happen to take it at the very moment both persons have their hands 
on the glass, it's hard to tell who passed the glass to whom. If you 
have a movie of the same scene, it's obvious who's the giver and who's 
the receiver. Same use case.

More to the point, there's a host of C and python frameworks to 
dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all 
build on top of libvmi. I did not want to reinvent the wheel.

Mind you, 99.9% of people that do dynamic VM analysis use xen. They 
contend that xen has better introspection support. In my case, I did not 
want to bother with dedicating a full server to be a xen domain 0. I 
just wanted to do a quick test by standing up a QEMU/kvm VM, in an 
otherwise purposed server.


>
>> Also, the pmemsave commands QAPI should be changed to be usable with
>> 64bit VM's
>>
>> in qapi-schema.json
>>
>> from
>>
>> ---
>> { 'command': 'pmemsave',
>>    'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>> ---
>>
>> to
>>
>> ---
>> { 'command': 'pmemsave',
>>    'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>> ---
> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
> confusing.
I think it's confusing for the HMP parser too. If you have a VM with 8Gb 
of RAM and want to snapshot the whole physical memory, via HMP over 
telnet this is what happens:

$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) help pmemsave
pmemsave addr size file -- save to disk physical memory dump starting at 
'addr' of size 'size'
(qemu) pmemsave 0 8589934591 "/tmp/memorydump"
'pmemsave' has failed: integer is for 32-bit values
Try "help pmemsave" for more information
(qemu) quit

With the changes I suggested, the command succeeds

$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) help pmemsave
pmemsave addr size file -- save to disk physical memory dump starting at 
'addr' of size 'size'
(qemu) pmemsave 0 8589934591 "/tmp/memorydump"
(qemu) quit

However I just noticed that the dump is just about 4GB in size, so there 
might be more changes needed to snapshot all physical memory of a 64 but 
VM. I did not investigate any further.

ls -l /tmp/memorydump
-rw-rw-r-- 1 libvirt-qemu kvm 4294967295 Oct 16 08:04 /tmp/memorydump

>> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
>> did not make the above pmemsave changes part of my patch.
>>
>> Let me know if you have any questions,
>>
>> Valerio

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-16 14:30   ` Valerio Aimale
@ 2015-10-19  7:52     ` Markus Armbruster
  2015-10-19 14:37       ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-19  7:52 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: qemu-devel, ehabkost, lcapitulino

Valerio Aimale <valerio@aimale.com> writes:

> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>> valerio@aimale.com writes:
>>
>>> All-
>>>
>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>> introspect QEMU/KVM VMs.
>>>
>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>
>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>> commands is invoked with a string arguments (a filename), it will open
>>> a UNIX socket and spawn a listening thread.
>>>
>>> The client writes binary commands to the socket, in the form of a c
>>> structure:
>>>
>>> struct request {
>>>       uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>       uint64_t address;   // address to read from OR write to
>>>       uint64_t length;    // number of bytes to read OR write
>>> };
>>>
>>> The client receives as a response, either (length+1) bytes, if it is a
>>> read operation, or 1 byte ifit is a write operation.
>>>
>>> The last bytes of a read operation response indicates success (1
>>> success, 0 failure). The single byte returned for a write operation
>>> indicates same (1 success, 0 failure).
>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>> garbage followed by the "it failed" byte?
> Markus, that appear to be the case. However, I did not write the
> communication protocol between libvmi and qemu. I'm assuming that the
> person that wrote the protocol, did not want to bother with over
> complicating things.
>
> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>
> I'm thinking he assumed reads would be small in size and the price of
> reading garbage was less than the price of writing a more complicated
> protocol. I can see his point, confronted with the same problem, I
> might have done the same.

All right, the interface is designed for *small* memory blocks then.

Makes me wonder why he needs a separate binary protocol on a separate
socket.  Small blocks could be done just fine in QMP.

>>> The socket API was written by the libvmi author and it works the with
>>> current libvmi version. The libvmi client-side implementation is at:
>>>
>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>
>>> As many use kvm VM's for introspection, malware and security analysis,
>>> it might be worth thinking about making the pmemaccess a permanent
>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>> point release.
>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>
>> Can you explain why these won't do for your use case?
> For people who do security analysis there are two use cases, static
> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
> can do static analysis. I.e. snapshotting a VM and see what was
> happening at that point in time.
> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>
> If you take a snapshot of two people exchanging a glass of water, and
> you happen to take it at the very moment both persons have their hands
> on the glass, it's hard to tell who passed the glass to whom. If you
> have a movie of the same scene, it's obvious who's the giver and who's
> the receiver. Same use case.

I understand the need for introspecting a running guest.  What exactly
makes the existing commands unsuitable for that?

> More to the point, there's a host of C and python frameworks to
> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
> build on top of libvmi. I did not want to reinvent the wheel.

Fair enough.

Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
memory files."  What exactly is missing for KVM?

> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
> contend that xen has better introspection support. In my case, I did
> not want to bother with dedicating a full server to be a xen domain
> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
> an otherwise purposed server.

I'm not at all against better introspection support in QEMU.  I'm just
trying to understand the problem you're trying to solve with your
patches.

>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>> 64bit VM's
>>>
>>> in qapi-schema.json
>>>
>>> from
>>>
>>> ---
>>> { 'command': 'pmemsave',
>>>    'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>> ---
>>>
>>> to
>>>
>>> ---
>>> { 'command': 'pmemsave',
>>>    'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>> ---
>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>> confusing.
> I think it's confusing for the HMP parser too. If you have a VM with
> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
> over telnet this is what happens:
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) help pmemsave
> pmemsave addr size file -- save to disk physical memory dump starting
> at 'addr' of size 'size'
> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
> 'pmemsave' has failed: integer is for 32-bit values
> Try "help pmemsave" for more information
> (qemu) quit

Your change to pmemsave's definition in qapi-schema.json is effectively a
no-op.

Your example shows *HMP* command pmemsave.  The definition of an HMP
command is *independent* of the QMP command.  The implementation *uses*
the QMP command.

QMP pmemsave is defined in qapi-schema.json as

    { 'command': 'pmemsave',
      'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }

Its implementation is in cpus.c:

    void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
                      Error **errp)

Note the int64_t size.

HMP pmemsave is defined in hmp-commands.hx as

    {
        .name       = "pmemsave",
        .args_type  = "val:l,size:i,filename:s",
        .params     = "addr size file",
        .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
        .mhandler.cmd = hmp_pmemsave,
    },

Its implementation is in hmp.c:

    void hmp_pmemsave(Monitor *mon, const QDict *qdict)
    {
        uint32_t size = qdict_get_int(qdict, "size");
        const char *filename = qdict_get_str(qdict, "filename");
        uint64_t addr = qdict_get_int(qdict, "val");
        Error *err = NULL;

        qmp_pmemsave(addr, size, filename, &err);
        hmp_handle_error(mon, &err);
    }

Note uint32_t size.

Arguably, the QMP size argument should use 'size' (an alias for
'uint64'), and the HMP args_type should use 'size:o'.

> With the changes I suggested, the command succeeds
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) help pmemsave
> pmemsave addr size file -- save to disk physical memory dump starting
> at 'addr' of size 'size'
> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
> (qemu) quit
>
> However I just noticed that the dump is just about 4GB in size, so
> there might be more changes needed to snapshot all physical memory of
> a 64 but VM. I did not investigate any further.
>
> ls -l /tmp/memorydump
> -rw-rw-r-- 1 libvirt-qemu kvm 4294967295 Oct 16 08:04 /tmp/memorydump
>
>>> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
>>> did not make the above pmemsave changes part of my patch.
>>>
>>> Let me know if you have any questions,
>>>
>>> Valerio

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-19  7:52     ` Markus Armbruster
@ 2015-10-19 14:37       ` Valerio Aimale
  2015-10-21 10:54         ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-19 14:37 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, ehabkost, lcapitulino

On 10/19/15 1:52 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>> valerio@aimale.com writes:
>>>
>>>> All-
>>>>
>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>> introspect QEMU/KVM VMs.
>>>>
>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>
>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>> commands is invoked with a string arguments (a filename), it will open
>>>> a UNIX socket and spawn a listening thread.
>>>>
>>>> The client writes binary commands to the socket, in the form of a c
>>>> structure:
>>>>
>>>> struct request {
>>>>        uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>        uint64_t address;   // address to read from OR write to
>>>>        uint64_t length;    // number of bytes to read OR write
>>>> };
>>>>
>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>> read operation, or 1 byte ifit is a write operation.
>>>>
>>>> The last bytes of a read operation response indicates success (1
>>>> success, 0 failure). The single byte returned for a write operation
>>>> indicates same (1 success, 0 failure).
>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>> garbage followed by the "it failed" byte?
>> Markus, that appear to be the case. However, I did not write the
>> communication protocol between libvmi and qemu. I'm assuming that the
>> person that wrote the protocol, did not want to bother with over
>> complicating things.
>>
>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>
>> I'm thinking he assumed reads would be small in size and the price of
>> reading garbage was less than the price of writing a more complicated
>> protocol. I can see his point, confronted with the same problem, I
>> might have done the same.
> All right, the interface is designed for *small* memory blocks then.
>
> Makes me wonder why he needs a separate binary protocol on a separate
> socket.  Small blocks could be done just fine in QMP.

The problem is speed. if one's analyzing the memory space of a running 
process (physical and paged), libvmi will make a large number of small 
and mid-sized reads. If one uses xp, or pmemsave, the overhead is quite 
significant. xp has overhead due to encoding, and pmemsave has overhead 
due to file open/write (server), file open/read/close/unlink (client).

Others have gone through the problem before me. It appears that pmemsave 
and xp are significantly slower than reading memory using a socket via 
pmemaccess.

The following data is not mine, but it shows the time, in milliseconds, 
required to resolve the content of a paged memory address via socket 
(pmemaccess) , pmemsave and xp

http://cl.ly/image/322a3s0h1V05

Again, I did not produce those data points, they come from an old libvmi 
thread.

I think it might be conceivable that there could be a QMP command that 
returns the content of an arbitrarily size memory region as a base64 or 
a base85 json string. It would still have both time- (due to 
encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)  
overhead, + json encoding/decoding overhead. It might still be the case 
that socket would outperform such a command as well, speed-vise. I don't 
think it would be any faster than xp.

There's also a similar patch, floating around the internet, the uses 
shared memory, instead of sockets, as inter-process communication 
between libvmi and QEMU. I've never used that.


>
>>>> The socket API was written by the libvmi author and it works the with
>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>> it might be worth thinking about making the pmemaccess a permanent
>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>> point release.
>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>
>>> Can you explain why these won't do for your use case?
>> For people who do security analysis there are two use cases, static
>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>> can do static analysis. I.e. snapshotting a VM and see what was
>> happening at that point in time.
>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>
>> If you take a snapshot of two people exchanging a glass of water, and
>> you happen to take it at the very moment both persons have their hands
>> on the glass, it's hard to tell who passed the glass to whom. If you
>> have a movie of the same scene, it's obvious who's the giver and who's
>> the receiver. Same use case.
> I understand the need for introspecting a running guest.  What exactly
> makes the existing commands unsuitable for that?
Speed. See discussion above.
>
>> More to the point, there's a host of C and python frameworks to
>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>> build on top of libvmi. I did not want to reinvent the wheel.
> Fair enough.
>
> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
> memory files."  What exactly is missing for KVM?
When they say they support kvm, what they really mean they support the 
(retired, I understand) qemu-kvm fork via a patch that is provided in 
the libvmi source tree. I think the most recent qem-kvm supported is 1.6.0

https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch

I wanted to bring support to the head revision of QEMU, to bring libvmi 
level with more modern QEMU.

Maybe the solution is simply to put this patch in the libvmi source 
tree,  which I've already asked to do via pull request, leaving QEMU alone.
However, the patch has to be updated at every QEMU point release. I 
wanted to avoid that, if at all possible.

>
>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>> contend that xen has better introspection support. In my case, I did
>> not want to bother with dedicating a full server to be a xen domain
>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>> an otherwise purposed server.
> I'm not at all against better introspection support in QEMU.  I'm just
> trying to understand the problem you're trying to solve with your
> patches.
What all users of libvmi would love to have is super high speed access 
to VM physical memory as part of the QEMU source tree, and not supported 
via a patch. Implemented as the QEMU owners see fit, as long as it is 
blazing fast and easy accessed via client library or inter-process 
communication.
My gut feeling is that it has to bypass QMP protocol/encoding/file 
access/json to be fast, but, it is just a gut feeling - worth nothing.

>
>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>> 64bit VM's
>>>>
>>>> in qapi-schema.json
>>>>
>>>> from
>>>>
>>>> ---
>>>> { 'command': 'pmemsave',
>>>>     'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>> ---
>>>>
>>>> to
>>>>
>>>> ---
>>>> { 'command': 'pmemsave',
>>>>     'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>> ---
>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>> confusing.
>> I think it's confusing for the HMP parser too. If you have a VM with
>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>> over telnet this is what happens:
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) help pmemsave
>> pmemsave addr size file -- save to disk physical memory dump starting
>> at 'addr' of size 'size'
>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>> 'pmemsave' has failed: integer is for 32-bit values
>> Try "help pmemsave" for more information
>> (qemu) quit
> Your change to pmemsave's definition in qapi-schema.json is effectively a
> no-op.
>
> Your example shows *HMP* command pmemsave.  The definition of an HMP
> command is *independent* of the QMP command.  The implementation *uses*
> the QMP command.
>
> QMP pmemsave is defined in qapi-schema.json as
>
>      { 'command': 'pmemsave',
>        'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>
> Its implementation is in cpus.c:
>
>      void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>                        Error **errp)
>
> Note the int64_t size.
>
> HMP pmemsave is defined in hmp-commands.hx as
>
>      {
>          .name       = "pmemsave",
>          .args_type  = "val:l,size:i,filename:s",
>          .params     = "addr size file",
>          .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>          .mhandler.cmd = hmp_pmemsave,
>      },
>
> Its implementation is in hmp.c:
>
>      void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>      {
>          uint32_t size = qdict_get_int(qdict, "size");
>          const char *filename = qdict_get_str(qdict, "filename");
>          uint64_t addr = qdict_get_int(qdict, "val");
>          Error *err = NULL;
>
>          qmp_pmemsave(addr, size, filename, &err);
>          hmp_handle_error(mon, &err);
>      }
>
> Note uint32_t size.
>
> Arguably, the QMP size argument should use 'size' (an alias for
> 'uint64'), and the HMP args_type should use 'size:o'.
Understand all that. Indeed, I've re-implemented 'pmemaccess' the same 
way pmemsave is implemented. There is a single function, and two points 
of entrance, one for HMP and one for QMP. I think pmemacess mimics 
pmemsave closely.

However, if one wants to simply dump a memory region, via HMP for human 
easy of use/debug/testing purposes, one cannot dump memory regions that 
resides higher than 2^32-1


>> With the changes I suggested, the command succeeds
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) help pmemsave
>> pmemsave addr size file -- save to disk physical memory dump starting
>> at 'addr' of size 'size'
>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>> (qemu) quit
>>
>> However I just noticed that the dump is just about 4GB in size, so
>> there might be more changes needed to snapshot all physical memory of
>> a 64 but VM. I did not investigate any further.
>>
>> ls -l /tmp/memorydump
>> -rw-rw-r-- 1 libvirt-qemu kvm 4294967295 Oct 16 08:04 /tmp/memorydump
>>
>>>> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
>>>> did not make the above pmemsave changes part of my patch.
>>>>
>>>> Let me know if you have any questions,
>>>>
>>>> Valerio

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently.
  2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
@ 2015-10-19 21:33   ` Eric Blake
  2015-10-21 15:11     ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Blake @ 2015-10-19 21:33 UTC (permalink / raw)
  To: valerio, qemu-devel; +Cc: armbru, ehabkost, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 10877 bytes --]

On 10/15/2015 05:44 PM, valerio@aimale.com wrote:
> From: Valerio Aimale <valerio@aimale.com>

Long subject line, and no message body.  Remember, you want the subject
line to be a one-line short summary of 'what', then the commit body
message for 'why', as in:

qmp: add command for libvmi memory introspection

In the past, libvmi was relying on an out-of-tree patch to qemu that
provides a new QMP command pmemaccess.  It is now time to make this
command part of qemu.

pmemaccess is used to create a side-channel communication path that can
more effectively be used to query lots of small memory chunks without
the overhead of one QMP command per chunk. ...

> 
> ---

You are missing a Signed-off-by: tag.  Without that, we cannot take your
patch.  But at least we can still review it:

>  Makefile.target  |   2 +-
>  hmp-commands.hx  |  14 ++++
>  hmp.c            |   9 +++
>  hmp.h            |   1 +
>  memory-access.c  | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  memory-access.h  |  21 ++++++
>  qapi-schema.json |  28 ++++++++
>  qmp-commands.hx  |  23 +++++++
>  8 files changed, 303 insertions(+), 1 deletion(-)
>  create mode 100644 memory-access.c
>  create mode 100644 memory-access.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index 962d004..940ab51 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER
>  #########################################################
>  # System emulator target
>  ifdef CONFIG_SOFTMMU
> -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
> +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o

This line is now over 80 columns; please wrap.

>  obj-y += qtest.o bootdevice.o

In fact, you could have just appended it into this line instead.

> +++ b/hmp-commands.hx
> @@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}.
>  ETEXI
>  
>      {
> +        .name       = "pmemaccess",
> +        .args_type  = "path:s",
> +        .params     = "file",
> +        .help       = "open A UNIX Socket access to physical memory at 'path'",

s/A/a/

Awkward grammar; better might be:

open a Unix socket at 'path' for use in accessing physical memory

Please also document where the user can find the protocol that will be
used across the side-channel socket thus opened.


> +++ b/memory-access.c
> @@ -0,0 +1,206 @@
> +/*
> + * Access guest physical memory via a domain socket.
> + *
> + * Copyright (C) 2011 Sandia National Laboratories
> + * Original Author: Bryan D. Payne (bdpayne@acm.org)
> + * 
> + * Refurbished for modern QEMU by Valerio Aimale (valerio@aimale.com), in 2015
> + */

I would expect at least something under docs/ in addition to this file
(the protocol spoken over the socket should be well-documented, and not
just by reading the source code).  Compare with docs/qmp-spec.txt.

> +struct request{
> +    uint8_t type;      // 0 quit, 1 read, 2 write, ... rest reserved
> +    uint64_t address;  // address to read from OR write to
> +    uint64_t length;   // number of bytes to read OR write

Any particular endianness constraints to worry about?

> +};
> +
> +static uint64_t
> +connection_read_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
> +{
> +    hwaddr paddr = (hwaddr) user_paddr;
> +    hwaddr len = (hwaddr) user_len;
> +    void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
> +    if (!guestmem){

Space before {, throughout the patch.

> +static void
> +send_success_ack (int connection_fd)

No space before ( in function usage.

> +{
> +    uint8_t success = 1;

Magic number; I'd expect an enum or #defines somewhere.

> +    int nbytes = write(connection_fd, &success, 1);
> +    if (1 != nbytes){
> +        fprintf(stderr, "Qemu pmemaccess: failed to send success ack\n");

Is fprintf() really the best approach for error reporting?


> +static void
> +connection_handler (int connection_fd)
> +{
> +    int nbytes;
> +    struct request req;
> +
> +    while (1){
> +        // client request should match the struct request format

We prefer /* */ comments over //.

> +        nbytes = read(connection_fd, &req, sizeof(struct request));
> +        if (nbytes != sizeof(struct request)){
> +            // error
> +            continue;

Silently ignoring errors?

> +        }
> +        else if (req.type == 0){
> +            // request to quit, goodbye
> +            break;
> +        }
> +        else if (req.type == 1){
> +            // request to read
> +            char *buf = malloc(req.length + 1);

Mixing g_malloc/g_free with malloc/free makes it a bit harder to reason
about things; it might be better to use only glib allocation.

> +            nbytes = connection_read_memory(req.address, buf, req.length);
> +            if (nbytes != req.length){
> +                // read failure, return failure message
> +                buf[req.length] = 0; // set last byte to 0 for failure
> +                nbytes = write(connection_fd, buf, 1);
> +            }
> +            else{
> +                // read success, return bytes
> +                buf[req.length] = 1; // set last byte to 1 for success
> +                nbytes = write(connection_fd, buf, nbytes + 1);
> +            }
> +            free(buf);
> +        }
> +        else if (req.type == 2){
> +            // request to write
> +            void *write_buf = malloc(req.length);
> +            nbytes = read(connection_fd, &write_buf, req.length);

No error checking that the allocation succeeded? How do you protect from
bogus requests?

> +            if (nbytes != req.length){
> +                // failed reading the message to write
> +                send_fail_ack(connection_fd);
> +            }
> +            else{

} on the same line as else

> +                // do the write
> +                nbytes = connection_write_memory(req.address, write_buf, req.length);
> +                if (nbytes == req.length){
> +                    send_success_ack(connection_fd);
> +                }
> +                else{
> +                    send_fail_ack(connection_fd);
> +                }
> +            }
> +            free(write_buf);
> +        }
> +        else{
> +            // unknown command
> +            fprintf(stderr, "Qemu pmemaccess: ignoring unknown command (%d)\n", req.type);
> +            char *buf = malloc(1);
> +            buf[0] = 0;
> +            nbytes = write(connection_fd, buf, 1);
> +            free(buf);
> +        }
> +    }
> +
> +    close(connection_fd);
> +}
> +
> +static void *
> +memory_access_thread (void *p)

The most common style in qemu puts return type on the same line as the
function name.

> +{
> +    struct sockaddr_un address;
> +    int socket_fd, connection_fd;
> +    socklen_t address_length;
> +    struct pmemaccess_args *pargs = (struct pmemaccess_args *)p;

This cast is not necessary in C.

> +
> +    socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
> +    if (socket_fd < 0){
> +	error_setg(pargs->errp, "Qemu pmemaccess: socket failed");

TAB damage.  Also, mentioning 'Qemu' in an error message is probably
redundant.  That, and we prefer 'qemu' over 'Qemu'.

> +        goto error_exit;
> +    }
> +    unlink(pargs->path);

No check for errors?

> +    address.sun_family = AF_UNIX;
> +    address_length = sizeof(address.sun_family) + sprintf(address.sun_path, "%s", (char *) pargs->path);

Long line.  sprintf() is dangerous if you are not positive that
pargs->path fits.  Why do you need the cast?

> +
> +    if (bind(socket_fd, (struct sockaddr *) &address, address_length) != 0){
> +	error_setg(pargs->errp, "Qemu pmemaccess: bind failed");

More TAB damage.  Please read
http://wiki.qemu.org/Contribute/SubmitAPatch for more contribution
hints, and be sure to run ./scripts/checkpatch.pl.


> +void
> +qmp_pmemaccess (const char *path, Error **errp)
> +{
> +    pthread_t thread;
> +    sigset_t set, oldset;
> +    struct pmemaccess_args *pargs;
> +
> +    // create the args struct
> +    pargs = (struct pmemaccess_args *) malloc(sizeof(struct pmemaccess_args));
> +    pargs->errp = errp;
> +    // create a copy of path that we can safely use
> +    pargs->path = malloc(strlen(path) + 1);
> +    memcpy(pargs->path, path, strlen(path) + 1);

g_strdup() is your friend.


> +++ b/qapi-schema.json
> @@ -1427,6 +1427,34 @@
>    'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>  
>  ##
> +# @pmemaccess:
> +#
> +# Open A UNIX Socket access to physical memory

s/A UNIX Socket/a Unix socket/

Same wording suggestion as earlier; might be better as:

Open a Unix socket used as a side-channel for efficiently accessing
physical memory.

> +#
> +# @path: the name of the UNIX socket pipe
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 2.4.0.1

2.5, not 2.4.0.1.

> +#
> +# Notes: Derived from previously existing patches.

Dead sentence that doesn't add anything to the current specification.

> When command
> +# succeeds connect to the open socket. Write a binary structure to 
> +# the socket as:
> +# 
> +# struct request {
> +#     uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
> +#     uint64_t address;   // address to read from OR write to
> +#     uint64_t length;    // number of bytes to read OR write
> +# };
> +# 
> +# If it is a read operation, Qemu will return lenght+1 bytes. Read lenght+1

s/lenght/length/ twice

> +# bytes. the last byte will be a 1 for success, or a 0 for failure.
> +#  
> +##
> +{ 'command': 'pmemaccess',
> +  'data': {'path': 'str'} }
> +
> +##
>  # @cont:
>  #
>  # Resume guest VCPU execution.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index d2ba800..26e4a51 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -472,6 +472,29 @@ Example:
>  EQMP
>  
>      {
> +        .name       = "pmemaccess",
> +        .args_type  = "path:s",
> +        .mhandler.cmd_new = qmp_marshal_input_pmemaccess,
> +    },
> +
> +SQMP
> +pmemaccess
> +----------
> +
> +Open A UNIX Socket access to physical memory
> +
> +Arguments:
> +
> +- "path": mount point path (json-string)
> +
> +Example:
> +
> +-> { "execute": "pmemaccess",
> +             "arguments": { "path": "/tmp/guestname" } }
> +<- { "return": {} }
> +
> +EQMP
> +    {
>          .name       = "inject-nmi",
>          .args_type  = "",
>          .mhandler.cmd_new = qmp_marshal_inject_nmi,
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-19 14:37       ` Valerio Aimale
@ 2015-10-21 10:54         ` Markus Armbruster
  2015-10-21 15:50           ` Valerio Aimale
                             ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Markus Armbruster @ 2015-10-21 10:54 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: lcapitulino, qemu-devel, ehabkost

Valerio Aimale <valerio@aimale.com> writes:

> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>> valerio@aimale.com writes:
>>>>
>>>>> All-
>>>>>
>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>> introspect QEMU/KVM VMs.
>>>>>
>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>
>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>> a UNIX socket and spawn a listening thread.
>>>>>
>>>>> The client writes binary commands to the socket, in the form of a c
>>>>> structure:
>>>>>
>>>>> struct request {
>>>>>        uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>        uint64_t address;   // address to read from OR write to
>>>>>        uint64_t length;    // number of bytes to read OR write
>>>>> };
>>>>>
>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>
>>>>> The last bytes of a read operation response indicates success (1
>>>>> success, 0 failure). The single byte returned for a write operation
>>>>> indicates same (1 success, 0 failure).
>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>> garbage followed by the "it failed" byte?
>>> Markus, that appear to be the case. However, I did not write the
>>> communication protocol between libvmi and qemu. I'm assuming that the
>>> person that wrote the protocol, did not want to bother with over
>>> complicating things.
>>>
>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>
>>> I'm thinking he assumed reads would be small in size and the price of
>>> reading garbage was less than the price of writing a more complicated
>>> protocol. I can see his point, confronted with the same problem, I
>>> might have done the same.
>> All right, the interface is designed for *small* memory blocks then.
>>
>> Makes me wonder why he needs a separate binary protocol on a separate
>> socket.  Small blocks could be done just fine in QMP.
>
> The problem is speed. if one's analyzing the memory space of a running
> process (physical and paged), libvmi will make a large number of small
> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
> quite significant. xp has overhead due to encoding, and pmemsave has
> overhead due to file open/write (server), file open/read/close/unlink
> (client).
>
> Others have gone through the problem before me. It appears that
> pmemsave and xp are significantly slower than reading memory using a
> socket via pmemaccess.

That they're slower isn't surprising, but I'd expect the cost of
encoding a small block to be insiginificant compared to the cost of the
network roundtrips.

As block size increases, the space overhead of encoding will eventually
bite.  But for that usage, the binary protocol appears ill-suited,
unless the client can pretty reliably avoid read failure.  I haven't
examined its failure modes, yet.

> The following data is not mine, but it shows the time, in
> milliseconds, required to resolve the content of a paged memory
> address via socket (pmemaccess) , pmemsave and xp
>
> http://cl.ly/image/322a3s0h1V05
>
> Again, I did not produce those data points, they come from an old
> libvmi thread.

90ms is a very long time.  What exactly was measured?

> I think it might be conceivable that there could be a QMP command that
> returns the content of an arbitrarily size memory region as a base64
> or a base85 json string. It would still have both time- (due to
> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
> overhead, + json encoding/decoding overhead. It might still be the
> case that socket would outperform such a command as well,
> speed-vise. I don't think it would be any faster than xp.

A special-purpose binary protocol over a dedicated socket will always do
less than a QMP solution (ignoring foolishness like transmitting crap on
read error the client is then expected to throw away).  The question is
whether the difference in work translates to a worthwhile difference in
performance.

The larger question is actually whether we have an existing interface
that can serve the libvmi's needs.  We've discussed monitor commands
like xp, pmemsave, pmemread.  There's another existing interface: the
GDB stub.  Have you considered it?

> There's also a similar patch, floating around the internet, the uses
> shared memory, instead of sockets, as inter-process communication
> between libvmi and QEMU. I've never used that.

By the time you built a working IPC mechanism on top of shared memory,
you're often no better off than with AF_LOCAL sockets.

Crazy idea: can we allocate guest memory in a way that support sharing
it with another process?  Eduardo, can -mem-path do such wild things?

>>>>> The socket API was written by the libvmi author and it works the with
>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>
>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>
>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>> point release.
>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>
>>>> Can you explain why these won't do for your use case?
>>> For people who do security analysis there are two use cases, static
>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>> can do static analysis. I.e. snapshotting a VM and see what was
>>> happening at that point in time.
>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>
>>> If you take a snapshot of two people exchanging a glass of water, and
>>> you happen to take it at the very moment both persons have their hands
>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>> have a movie of the same scene, it's obvious who's the giver and who's
>>> the receiver. Same use case.
>> I understand the need for introspecting a running guest.  What exactly
>> makes the existing commands unsuitable for that?
> Speed. See discussion above.
>>
>>> More to the point, there's a host of C and python frameworks to
>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>> build on top of libvmi. I did not want to reinvent the wheel.
>> Fair enough.
>>
>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>> memory files."  What exactly is missing for KVM?
> When they say they support kvm, what they really mean they support the
> (retired, I understand) qemu-kvm fork via a patch that is provided in
> the libvmi source tree. I think the most recent qem-kvm supported is
> 1.6.0
>
> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>
> I wanted to bring support to the head revision of QEMU, to bring
> libvmi level with more modern QEMU.
>
> Maybe the solution is simply to put this patch in the libvmi source
> tree,  which I've already asked to do via pull request, leaving QEMU
> alone.
> However, the patch has to be updated at every QEMU point release. I
> wanted to avoid that, if at all possible.
>
>>
>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>> contend that xen has better introspection support. In my case, I did
>>> not want to bother with dedicating a full server to be a xen domain
>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>> an otherwise purposed server.
>> I'm not at all against better introspection support in QEMU.  I'm just
>> trying to understand the problem you're trying to solve with your
>> patches.
> What all users of libvmi would love to have is super high speed access
> to VM physical memory as part of the QEMU source tree, and not
> supported via a patch. Implemented as the QEMU owners see fit, as long
> as it is blazing fast and easy accessed via client library or
> inter-process communication.

The use case makes sense to me, we just need to figure out how we want
to serve it in QEMU.

> My gut feeling is that it has to bypass QMP protocol/encoding/file
> access/json to be fast, but, it is just a gut feeling - worth nothing.

My gut feeling is that QMP should do fine in overhead compared to other
solutions involving socket I/O as long as the data sizes are *small*.
Latency might be an issue, though: QMP commands are processed from the
main loop.  A dedicated server thread can be more responsive, but
letting it write to shared resources could be "interesting".

>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>> 64bit VM's
>>>>>
>>>>> in qapi-schema.json
>>>>>
>>>>> from
>>>>>
>>>>> ---
>>>>> { 'command': 'pmemsave',
>>>>>     'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>> ---
>>>>>
>>>>> to
>>>>>
>>>>> ---
>>>>> { 'command': 'pmemsave',
>>>>>     'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>> ---
>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>> confusing.
>>> I think it's confusing for the HMP parser too. If you have a VM with
>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>> over telnet this is what happens:
>>>
>>> $ telnet localhost 1234
>>> Trying 127.0.0.1...
>>> Connected to localhost.
>>> Escape character is '^]'.
>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>> (qemu) help pmemsave
>>> pmemsave addr size file -- save to disk physical memory dump starting
>>> at 'addr' of size 'size'
>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>> 'pmemsave' has failed: integer is for 32-bit values
>>> Try "help pmemsave" for more information
>>> (qemu) quit
>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>> no-op.
>>
>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>> command is *independent* of the QMP command.  The implementation *uses*
>> the QMP command.
>>
>> QMP pmemsave is defined in qapi-schema.json as
>>
>>      { 'command': 'pmemsave',
>>        'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>
>> Its implementation is in cpus.c:
>>
>>      void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>                        Error **errp)
>>
>> Note the int64_t size.
>>
>> HMP pmemsave is defined in hmp-commands.hx as
>>
>>      {
>>          .name       = "pmemsave",
>>          .args_type  = "val:l,size:i,filename:s",
>>          .params     = "addr size file",
>>          .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>          .mhandler.cmd = hmp_pmemsave,
>>      },
>>
>> Its implementation is in hmp.c:
>>
>>      void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>      {
>>          uint32_t size = qdict_get_int(qdict, "size");
>>          const char *filename = qdict_get_str(qdict, "filename");
>>          uint64_t addr = qdict_get_int(qdict, "val");
>>          Error *err = NULL;
>>
>>          qmp_pmemsave(addr, size, filename, &err);
>>          hmp_handle_error(mon, &err);
>>      }
>>
>> Note uint32_t size.
>>
>> Arguably, the QMP size argument should use 'size' (an alias for
>> 'uint64'), and the HMP args_type should use 'size:o'.
> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
> way pmemsave is implemented. There is a single function, and two
> points of entrance, one for HMP and one for QMP. I think pmemacess
> mimics pmemsave closely.
>
> However, if one wants to simply dump a memory region, via HMP for
> human easy of use/debug/testing purposes, one cannot dump memory
> regions that resides higher than 2^32-1

Can you give an example?

[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently.
  2015-10-19 21:33   ` Eric Blake
@ 2015-10-21 15:11     ` Valerio Aimale
  0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-21 15:11 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: armbru, ehabkost, lcapitulino

Eric,

thanks for your comments. I'm going to take the liberty to top posts 
some notes.

On grammar awkwardness, indentation, documentation, and coding style. I 
agree with you. Mea culpa. I take full responsibility. I was too eager 
to submit the patch. I'll be less eager in the future.
If and when we decide that this patch belongs in the QEMU source tree, I 
will clean up grammar, documentation and code. However, as per 
discussion with Markus, that is still up in the air. So I'll hold of on 
those for now.

Below discussions of two issues only, endianness and fprintf.

Valerio

On 10/19/15 3:33 PM, Eric Blake wrote:
> On 10/15/2015 05:44 PM, valerio@aimale.com wrote:
>> From: Valerio Aimale <valerio@aimale.com>
> Long subject line, and no message body.  Remember, you want the subject
> line to be a one-line short summary of 'what', then the commit body
> message for 'why', as in:
>
> qmp: add command for libvmi memory introspection
>
> In the past, libvmi was relying on an out-of-tree patch to qemu that
> provides a new QMP command pmemaccess.  It is now time to make this
> command part of qemu.
>
> pmemaccess is used to create a side-channel communication path that can
> more effectively be used to query lots of small memory chunks without
> the overhead of one QMP command per chunk. ...
>
>> ---
> You are missing a Signed-off-by: tag.  Without that, we cannot take your
> patch.  But at least we can still review it:
>
>>   Makefile.target  |   2 +-
>>   hmp-commands.hx  |  14 ++++
>>   hmp.c            |   9 +++
>>   hmp.h            |   1 +
>>   memory-access.c  | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   memory-access.h  |  21 ++++++
>>   qapi-schema.json |  28 ++++++++
>>   qmp-commands.hx  |  23 +++++++
>>   8 files changed, 303 insertions(+), 1 deletion(-)
>>   create mode 100644 memory-access.c
>>   create mode 100644 memory-access.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 962d004..940ab51 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER
>>   #########################################################
>>   # System emulator target
>>   ifdef CONFIG_SOFTMMU
>> -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
>> +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o
> This line is now over 80 columns; please wrap.
>
>>   obj-y += qtest.o bootdevice.o
> In fact, you could have just appended it into this line instead.
>
>> +++ b/hmp-commands.hx
>> @@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}.
>>   ETEXI
>>   
>>       {
>> +        .name       = "pmemaccess",
>> +        .args_type  = "path:s",
>> +        .params     = "file",
>> +        .help       = "open A UNIX Socket access to physical memory at 'path'",
> s/A/a/
>
> Awkward grammar; better might be:
>
> open a Unix socket at 'path' for use in accessing physical memory
>
> Please also document where the user can find the protocol that will be
> used across the side-channel socket thus opened.
>
>
>> +++ b/memory-access.c
>> @@ -0,0 +1,206 @@
>> +/*
>> + * Access guest physical memory via a domain socket.
>> + *
>> + * Copyright (C) 2011 Sandia National Laboratories
>> + * Original Author: Bryan D. Payne (bdpayne@acm.org)
>> + *
>> + * Refurbished for modern QEMU by Valerio Aimale (valerio@aimale.com), in 2015
>> + */
> I would expect at least something under docs/ in addition to this file
> (the protocol spoken over the socket should be well-documented, and not
> just by reading the source code).  Compare with docs/qmp-spec.txt.
>
>> +struct request{
>> +    uint8_t type;      // 0 quit, 1 read, 2 write, ... rest reserved
>> +    uint64_t address;  // address to read from OR write to
>> +    uint64_t length;   // number of bytes to read OR write
> Any particular endianness constraints to worry about?
That is a very interesting and insightful comment, that required some 
thinking. As I see it right now, the issue of endianness can be 
partitioned in 3 separate problems:

1) Endinanness concordance between libvmi and QEM host. As this patch 
uses a UNIX socket for inter-process communication, it implicitely 
assumes that libvmi and the QEMU host will run on the same machine, thus 
will have the same architecture, no need to endianness correction. If 
the patch were to use an inet socket, then hton and ntoh conversion 
would be required. It could be easily arranged by using this very useful 
header https://gist.github.com/panzi/6856583 , that provides 
architecture and platform independent implementation of htobe64() and 
be64toh() that are required to convert the two 64-bit members of the 
struct request. Of course there is the very interesting and intriguing 
scenario of somebody tunneling a UNIX socket from one host to another 
via inet socket with socat. However, as libvmi owns socket creation, 
that would be, not impossible , but not even that easy.

2) Endinanness concordance between QEM host and QEMU guest. As 
pmemaccess just calls cpu_physical_memory_map() and 
cpu_physical_memory_unmap(), no need for worrying about concordance 
here. Memory regions are treated as byte streams.

3) Endinanness concordance between libvmi and QEMU guest: this is 
responsibility of libvmi and should be implemented in libvmi. My 
understanding is that, as of now, libvmi supports introspection of 
Windows, Linux and Mac OSX running on x86 and x86_64 architectures. I 
believe that covers 99.(9) % of the needs of the 
security/analysis/introspection community. Soon, when arm64/CortexA53 
boards will dominate the server market, as some hypothesize, likely the 
need for libvmi endianness concordance will arise, for those 
introspecting from a x86/x86_64 to an ARM64 guest . However, that's some 
time ahead. I'm still waiting on my order of an ARM64 board. There is 
also the corner case of Xbox One introspection. I've never heard of an 
Xbox VM running on QEMU, but that's an interesting hypothesis.
>
>> +};
>> +
>> +static uint64_t
>> +connection_read_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
>> +{
>> +    hwaddr paddr = (hwaddr) user_paddr;
>> +    hwaddr len = (hwaddr) user_len;
>> +    void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
>> +    if (!guestmem){
> Space before {, throughout the patch.
>
>> +static void
>> +send_success_ack (int connection_fd)
> No space before ( in function usage.
>
>> +{
>> +    uint8_t success = 1;
> Magic number; I'd expect an enum or #defines somewhere.
>
>> +    int nbytes = write(connection_fd, &success, 1);
>> +    if (1 != nbytes){
>> +        fprintf(stderr, "Qemu pmemaccess: failed to send success ack\n");
> Is fprintf() really the best approach for error reporting?
No, it is not. However, to my partial defense I will say that I have 
reworked the code in the patch to use the Error **errp passed by the 
monitor. The original old patches used fprintf. You can see there are 
several call to error_setg(pargs->errp, ...) that I used to replace 
fprintf's. I missed these two spots. My fault.
>
>
>> +static void
>> +connection_handler (int connection_fd)
>> +{
>> +    int nbytes;
>> +    struct request req;
>> +
>> +    while (1){
>> +        // client request should match the struct request format
> We prefer /* */ comments over //.
>
>> +        nbytes = read(connection_fd, &req, sizeof(struct request));
>> +        if (nbytes != sizeof(struct request)){
>> +            // error
>> +            continue;
> Silently ignoring errors?
>
>> +        }
>> +        else if (req.type == 0){
>> +            // request to quit, goodbye
>> +            break;
>> +        }
>> +        else if (req.type == 1){
>> +            // request to read
>> +            char *buf = malloc(req.length + 1);
> Mixing g_malloc/g_free with malloc/free makes it a bit harder to reason
> about things; it might be better to use only glib allocation.
>
>> +            nbytes = connection_read_memory(req.address, buf, req.length);
>> +            if (nbytes != req.length){
>> +                // read failure, return failure message
>> +                buf[req.length] = 0; // set last byte to 0 for failure
>> +                nbytes = write(connection_fd, buf, 1);
>> +            }
>> +            else{
>> +                // read success, return bytes
>> +                buf[req.length] = 1; // set last byte to 1 for success
>> +                nbytes = write(connection_fd, buf, nbytes + 1);
>> +            }
>> +            free(buf);
>> +        }
>> +        else if (req.type == 2){
>> +            // request to write
>> +            void *write_buf = malloc(req.length);
>> +            nbytes = read(connection_fd, &write_buf, req.length);
> No error checking that the allocation succeeded? How do you protect from
> bogus requests?
>
>> +            if (nbytes != req.length){
>> +                // failed reading the message to write
>> +                send_fail_ack(connection_fd);
>> +            }
>> +            else{
> } on the same line as else
>
>> +                // do the write
>> +                nbytes = connection_write_memory(req.address, write_buf, req.length);
>> +                if (nbytes == req.length){
>> +                    send_success_ack(connection_fd);
>> +                }
>> +                else{
>> +                    send_fail_ack(connection_fd);
>> +                }
>> +            }
>> +            free(write_buf);
>> +        }
>> +        else{
>> +            // unknown command
>> +            fprintf(stderr, "Qemu pmemaccess: ignoring unknown command (%d)\n", req.type);
>> +            char *buf = malloc(1);
>> +            buf[0] = 0;
>> +            nbytes = write(connection_fd, buf, 1);
>> +            free(buf);
>> +        }
>> +    }
>> +
>> +    close(connection_fd);
>> +}
>> +
>> +static void *
>> +memory_access_thread (void *p)
> The most common style in qemu puts return type on the same line as the
> function name.
>
>> +{
>> +    struct sockaddr_un address;
>> +    int socket_fd, connection_fd;
>> +    socklen_t address_length;
>> +    struct pmemaccess_args *pargs = (struct pmemaccess_args *)p;
> This cast is not necessary in C.
>
>> +
>> +    socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
>> +    if (socket_fd < 0){
>> +	error_setg(pargs->errp, "Qemu pmemaccess: socket failed");
> TAB damage.  Also, mentioning 'Qemu' in an error message is probably
> redundant.  That, and we prefer 'qemu' over 'Qemu'.
>
>> +        goto error_exit;
>> +    }
>> +    unlink(pargs->path);
> No check for errors?
>
>> +    address.sun_family = AF_UNIX;
>> +    address_length = sizeof(address.sun_family) + sprintf(address.sun_path, "%s", (char *) pargs->path);
> Long line.  sprintf() is dangerous if you are not positive that
> pargs->path fits.  Why do you need the cast?
>
>> +
>> +    if (bind(socket_fd, (struct sockaddr *) &address, address_length) != 0){
>> +	error_setg(pargs->errp, "Qemu pmemaccess: bind failed");
> More TAB damage.  Please read
> http://wiki.qemu.org/Contribute/SubmitAPatch for more contribution
> hints, and be sure to run ./scripts/checkpatch.pl.
>
>
>> +void
>> +qmp_pmemaccess (const char *path, Error **errp)
>> +{
>> +    pthread_t thread;
>> +    sigset_t set, oldset;
>> +    struct pmemaccess_args *pargs;
>> +
>> +    // create the args struct
>> +    pargs = (struct pmemaccess_args *) malloc(sizeof(struct pmemaccess_args));
>> +    pargs->errp = errp;
>> +    // create a copy of path that we can safely use
>> +    pargs->path = malloc(strlen(path) + 1);
>> +    memcpy(pargs->path, path, strlen(path) + 1);
> g_strdup() is your friend.
>
>
>> +++ b/qapi-schema.json
>> @@ -1427,6 +1427,34 @@
>>     'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>   
>>   ##
>> +# @pmemaccess:
>> +#
>> +# Open A UNIX Socket access to physical memory
> s/A UNIX Socket/a Unix socket/
>
> Same wording suggestion as earlier; might be better as:
>
> Open a Unix socket used as a side-channel for efficiently accessing
> physical memory.
>
>> +#
>> +# @path: the name of the UNIX socket pipe
>> +#
>> +# Returns: Nothing on success
>> +#
>> +# Since: 2.4.0.1
> 2.5, not 2.4.0.1.
>
>> +#
>> +# Notes: Derived from previously existing patches.
> Dead sentence that doesn't add anything to the current specification.
>
>> When command
>> +# succeeds connect to the open socket. Write a binary structure to
>> +# the socket as:
>> +#
>> +# struct request {
>> +#     uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>> +#     uint64_t address;   // address to read from OR write to
>> +#     uint64_t length;    // number of bytes to read OR write
>> +# };
>> +#
>> +# If it is a read operation, Qemu will return lenght+1 bytes. Read lenght+1
> s/lenght/length/ twice
>
>> +# bytes. the last byte will be a 1 for success, or a 0 for failure.
>> +#
>> +##
>> +{ 'command': 'pmemaccess',
>> +  'data': {'path': 'str'} }
>> +
>> +##
>>   # @cont:
>>   #
>>   # Resume guest VCPU execution.
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index d2ba800..26e4a51 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -472,6 +472,29 @@ Example:
>>   EQMP
>>   
>>       {
>> +        .name       = "pmemaccess",
>> +        .args_type  = "path:s",
>> +        .mhandler.cmd_new = qmp_marshal_input_pmemaccess,
>> +    },
>> +
>> +SQMP
>> +pmemaccess
>> +----------
>> +
>> +Open A UNIX Socket access to physical memory
>> +
>> +Arguments:
>> +
>> +- "path": mount point path (json-string)
>> +
>> +Example:
>> +
>> +-> { "execute": "pmemaccess",
>> +             "arguments": { "path": "/tmp/guestname" } }
>> +<- { "return": {} }
>> +
>> +EQMP
>> +    {
>>           .name       = "inject-nmi",
>>           .args_type  = "",
>>           .mhandler.cmd_new = qmp_marshal_inject_nmi,
>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-21 10:54         ` Markus Armbruster
@ 2015-10-21 15:50           ` Valerio Aimale
  2015-10-22 11:50             ` Markus Armbruster
  2015-10-22 18:43           ` Valerio Aimale
  2015-10-22 19:12           ` Eduardo Habkost
  2 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-21 15:50 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: lcapitulino, qemu-devel, ehabkost

On 10/21/15 4:54 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>> valerio@aimale.com writes:
>>>>>
>>>>>> All-
>>>>>>
>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>> introspect QEMU/KVM VMs.
>>>>>>
>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>
>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>
>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>> structure:
>>>>>>
>>>>>> struct request {
>>>>>>         uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>         uint64_t address;   // address to read from OR write to
>>>>>>         uint64_t length;    // number of bytes to read OR write
>>>>>> };
>>>>>>
>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>
>>>>>> The last bytes of a read operation response indicates success (1
>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>> indicates same (1 success, 0 failure).
>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>> garbage followed by the "it failed" byte?
>>>> Markus, that appear to be the case. However, I did not write the
>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>> person that wrote the protocol, did not want to bother with over
>>>> complicating things.
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> I'm thinking he assumed reads would be small in size and the price of
>>>> reading garbage was less than the price of writing a more complicated
>>>> protocol. I can see his point, confronted with the same problem, I
>>>> might have done the same.
>>> All right, the interface is designed for *small* memory blocks then.
>>>
>>> Makes me wonder why he needs a separate binary protocol on a separate
>>> socket.  Small blocks could be done just fine in QMP.
>> The problem is speed. if one's analyzing the memory space of a running
>> process (physical and paged), libvmi will make a large number of small
>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>> quite significant. xp has overhead due to encoding, and pmemsave has
>> overhead due to file open/write (server), file open/read/close/unlink
>> (client).
>>
>> Others have gone through the problem before me. It appears that
>> pmemsave and xp are significantly slower than reading memory using a
>> socket via pmemaccess.
> That they're slower isn't surprising, but I'd expect the cost of
> encoding a small block to be insiginificant compared to the cost of the
> network roundtrips.
>
> As block size increases, the space overhead of encoding will eventually
> bite.  But for that usage, the binary protocol appears ill-suited,
> unless the client can pretty reliably avoid read failure.  I haven't
> examined its failure modes, yet.
>
>> The following data is not mine, but it shows the time, in
>> milliseconds, required to resolve the content of a paged memory
>> address via socket (pmemaccess) , pmemsave and xp
>>
>> http://cl.ly/image/322a3s0h1V05
>>
>> Again, I did not produce those data points, they come from an old
>> libvmi thread.
> 90ms is a very long time.  What exactly was measured?
That is a fair question to ask. Unfortunately, I extracted  that data 
plot from an old thread in some libvmi mailing list. I do not have the 
data and code that produced it. Sifting through the thread, I can see 
the code
was never published. I will take it upon myself to produce code that 
compares timing - in a fair fashion - of libvmi doing an atomic 
operation and a larger-scale operation (like listing running processes)  
via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a version of xp 
that returns byte streams of memory regions base64 or base85 encoded in 
json strings. I'll publish results and code.

However, given workload and life happening, it will be some time before 
I complete that task.
>
>> I think it might be conceivable that there could be a QMP command that
>> returns the content of an arbitrarily size memory region as a base64
>> or a base85 json string. It would still have both time- (due to
>> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
>> overhead, + json encoding/decoding overhead. It might still be the
>> case that socket would outperform such a command as well,
>> speed-vise. I don't think it would be any faster than xp.
> A special-purpose binary protocol over a dedicated socket will always do
> less than a QMP solution (ignoring foolishness like transmitting crap on
> read error the client is then expected to throw away).  The question is
> whether the difference in work translates to a worthwhile difference in
> performance.
>
> The larger question is actually whether we have an existing interface
> that can serve the libvmi's needs.  We've discussed monitor commands
> like xp, pmemsave, pmemread.  There's another existing interface: the
> GDB stub.  Have you considered it?
>
>> There's also a similar patch, floating around the internet, the uses
>> shared memory, instead of sockets, as inter-process communication
>> between libvmi and QEMU. I've never used that.
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process?  Eduardo, can -mem-path do such wild things?
>
>>>>>> The socket API was written by the libvmi author and it works the with
>>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>>> point release.
>>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>>
>>>>> Can you explain why these won't do for your use case?
>>>> For people who do security analysis there are two use cases, static
>>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>>> can do static analysis. I.e. snapshotting a VM and see what was
>>>> happening at that point in time.
>>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>>
>>>> If you take a snapshot of two people exchanging a glass of water, and
>>>> you happen to take it at the very moment both persons have their hands
>>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>>> have a movie of the same scene, it's obvious who's the giver and who's
>>>> the receiver. Same use case.
>>> I understand the need for introspecting a running guest.  What exactly
>>> makes the existing commands unsuitable for that?
>> Speed. See discussion above.
>>>> More to the point, there's a host of C and python frameworks to
>>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>>> build on top of libvmi. I did not want to reinvent the wheel.
>>> Fair enough.
>>>
>>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>>> memory files."  What exactly is missing for KVM?
>> When they say they support kvm, what they really mean they support the
>> (retired, I understand) qemu-kvm fork via a patch that is provided in
>> the libvmi source tree. I think the most recent qem-kvm supported is
>> 1.6.0
>>
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> I wanted to bring support to the head revision of QEMU, to bring
>> libvmi level with more modern QEMU.
>>
>> Maybe the solution is simply to put this patch in the libvmi source
>> tree,  which I've already asked to do via pull request, leaving QEMU
>> alone.
>> However, the patch has to be updated at every QEMU point release. I
>> wanted to avoid that, if at all possible.
>>
>>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>>> contend that xen has better introspection support. In my case, I did
>>>> not want to bother with dedicating a full server to be a xen domain
>>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>>> an otherwise purposed server.
>>> I'm not at all against better introspection support in QEMU.  I'm just
>>> trying to understand the problem you're trying to solve with your
>>> patches.
>> What all users of libvmi would love to have is super high speed access
>> to VM physical memory as part of the QEMU source tree, and not
>> supported via a patch. Implemented as the QEMU owners see fit, as long
>> as it is blazing fast and easy accessed via client library or
>> inter-process communication.
> The use case makes sense to me, we just need to figure out how we want
> to serve it in QEMU.
>
>> My gut feeling is that it has to bypass QMP protocol/encoding/file
>> access/json to be fast, but, it is just a gut feeling - worth nothing.
> My gut feeling is that QMP should do fine in overhead compared to other
> solutions involving socket I/O as long as the data sizes are *small*.
> Latency might be an issue, though: QMP commands are processed from the
> main loop.  A dedicated server thread can be more responsive, but
> letting it write to shared resources could be "interesting".

See above, I'll produce some code and results so we can all think about 
the issue.
>
>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>> 64bit VM's
>>>>>>
>>>>>> in qapi-schema.json
>>>>>>
>>>>>> from
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>>      'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>> ---
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>>      'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>> ---
>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>> confusing.
>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>> over telnet this is what happens:
>>>>
>>>> $ telnet localhost 1234
>>>> Trying 127.0.0.1...
>>>> Connected to localhost.
>>>> Escape character is '^]'.
>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>> (qemu) help pmemsave
>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>> at 'addr' of size 'size'
>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>> Try "help pmemsave" for more information
>>>> (qemu) quit
>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>> no-op.
>>>
>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>> command is *independent* of the QMP command.  The implementation *uses*
>>> the QMP command.
>>>
>>> QMP pmemsave is defined in qapi-schema.json as
>>>
>>>       { 'command': 'pmemsave',
>>>         'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>
>>> Its implementation is in cpus.c:
>>>
>>>       void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>                         Error **errp)
>>>
>>> Note the int64_t size.
>>>
>>> HMP pmemsave is defined in hmp-commands.hx as
>>>
>>>       {
>>>           .name       = "pmemsave",
>>>           .args_type  = "val:l,size:i,filename:s",
>>>           .params     = "addr size file",
>>>           .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>           .mhandler.cmd = hmp_pmemsave,
>>>       },
>>>
>>> Its implementation is in hmp.c:
>>>
>>>       void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>       {
>>>           uint32_t size = qdict_get_int(qdict, "size");
>>>           const char *filename = qdict_get_str(qdict, "filename");
>>>           uint64_t addr = qdict_get_int(qdict, "val");
>>>           Error *err = NULL;
>>>
>>>           qmp_pmemsave(addr, size, filename, &err);
>>>           hmp_handle_error(mon, &err);
>>>       }
>>>
>>> Note uint32_t size.
>>>
>>> Arguably, the QMP size argument should use 'size' (an alias for
>>> 'uint64'), and the HMP args_type should use 'size:o'.
>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>> way pmemsave is implemented. There is a single function, and two
>> points of entrance, one for HMP and one for QMP. I think pmemacess
>> mimics pmemsave closely.
>>
>> However, if one wants to simply dump a memory region, via HMP for
>> human easy of use/debug/testing purposes, one cannot dump memory
>> regions that resides higher than 2^32-1
> Can you give an example?
Yes. I was trying to dump the full extent of physical memory of a VM 
that has 8GB memory space (ballooned). I simply did this:

$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) pmemsave 0 8589934591 "/tmp/memsaved"
'pmemsave' has failed: integer is for 32-bit values

Maybe I misunderstood how pmemsave works. Maybe I should have used 
dump-guest-memory

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-21 15:50           ` Valerio Aimale
@ 2015-10-22 11:50             ` Markus Armbruster
  2015-10-22 18:11               ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-22 11:50 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: qemu-devel, ehabkost, lcapitulino

Valerio Aimale <valerio@aimale.com> writes:

> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>
>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>> valerio@aimale.com writes:
>>>>>>
>>>>>>> All-
>>>>>>>
>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>
>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>
>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>
>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>> structure:
>>>>>>>
>>>>>>> struct request {
>>>>>>>         uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>>         uint64_t address;   // address to read from OR write to
>>>>>>>         uint64_t length;    // number of bytes to read OR write
>>>>>>> };
>>>>>>>
>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>
>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>> indicates same (1 success, 0 failure).
>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>> garbage followed by the "it failed" byte?
>>>>> Markus, that appear to be the case. However, I did not write the
>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>> person that wrote the protocol, did not want to bother with over
>>>>> complicating things.
>>>>>
>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>
>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>> reading garbage was less than the price of writing a more complicated
>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>> might have done the same.
>>>> All right, the interface is designed for *small* memory blocks then.
>>>>
>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>> socket.  Small blocks could be done just fine in QMP.
>>> The problem is speed. if one's analyzing the memory space of a running
>>> process (physical and paged), libvmi will make a large number of small
>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>> overhead due to file open/write (server), file open/read/close/unlink
>>> (client).
>>>
>>> Others have gone through the problem before me. It appears that
>>> pmemsave and xp are significantly slower than reading memory using a
>>> socket via pmemaccess.
>> That they're slower isn't surprising, but I'd expect the cost of
>> encoding a small block to be insiginificant compared to the cost of the
>> network roundtrips.
>>
>> As block size increases, the space overhead of encoding will eventually
>> bite.  But for that usage, the binary protocol appears ill-suited,
>> unless the client can pretty reliably avoid read failure.  I haven't
>> examined its failure modes, yet.
>>
>>> The following data is not mine, but it shows the time, in
>>> milliseconds, required to resolve the content of a paged memory
>>> address via socket (pmemaccess) , pmemsave and xp
>>>
>>> http://cl.ly/image/322a3s0h1V05
>>>
>>> Again, I did not produce those data points, they come from an old
>>> libvmi thread.
>> 90ms is a very long time.  What exactly was measured?
> That is a fair question to ask. Unfortunately, I extracted  that data
> plot from an old thread in some libvmi mailing list. I do not have the
> data and code that produced it. Sifting through the thread, I can see
> the code
> was never published. I will take it upon myself to produce code that
> compares timing - in a fair fashion - of libvmi doing an atomic
> operation and a larger-scale operation (like listing running
> processes)  via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
> version of xp that returns byte streams of memory regions base64 or
> base85 encoded in json strings. I'll publish results and code.
>
> However, given workload and life happening, it will be some time
> before I complete that task.

No problem.  I'd like to have your use case addressed, but there's no
need for haste.

[...]
>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>> 64bit VM's
>>>>>>>
>>>>>>> in qapi-schema.json
>>>>>>>
>>>>>>> from
>>>>>>>
>>>>>>> ---
>>>>>>> { 'command': 'pmemsave',
>>>>>>>      'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>> ---
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> ---
>>>>>>> { 'command': 'pmemsave',
>>>>>>>      'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>> ---
>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>>> confusing.
>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>> over telnet this is what happens:
>>>>>
>>>>> $ telnet localhost 1234
>>>>> Trying 127.0.0.1...
>>>>> Connected to localhost.
>>>>> Escape character is '^]'.
>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>> (qemu) help pmemsave
>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>> at 'addr' of size 'size'
>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>> Try "help pmemsave" for more information
>>>>> (qemu) quit
>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>> no-op.
>>>>
>>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>>> command is *independent* of the QMP command.  The implementation *uses*
>>>> the QMP command.
>>>>
>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>
>>>>       { 'command': 'pmemsave',
>>>>         'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>
>>>> Its implementation is in cpus.c:
>>>>
>>>>       void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>>                         Error **errp)
>>>>
>>>> Note the int64_t size.
>>>>
>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>
>>>>       {
>>>>           .name       = "pmemsave",
>>>>           .args_type  = "val:l,size:i,filename:s",
>>>>           .params     = "addr size file",
>>>>           .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>>           .mhandler.cmd = hmp_pmemsave,
>>>>       },
>>>>
>>>> Its implementation is in hmp.c:
>>>>
>>>>       void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>>       {
>>>>           uint32_t size = qdict_get_int(qdict, "size");
>>>>           const char *filename = qdict_get_str(qdict, "filename");
>>>>           uint64_t addr = qdict_get_int(qdict, "val");
>>>>           Error *err = NULL;
>>>>
>>>>           qmp_pmemsave(addr, size, filename, &err);
>>>>           hmp_handle_error(mon, &err);
>>>>       }
>>>>
>>>> Note uint32_t size.
>>>>
>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>> way pmemsave is implemented. There is a single function, and two
>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>> mimics pmemsave closely.
>>>
>>> However, if one wants to simply dump a memory region, via HMP for
>>> human easy of use/debug/testing purposes, one cannot dump memory
>>> regions that resides higher than 2^32-1
>> Can you give an example?
> Yes. I was trying to dump the full extent of physical memory of a VM
> that has 8GB memory space (ballooned). I simply did this:
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
> 'pmemsave' has failed: integer is for 32-bit values
>
> Maybe I misunderstood how pmemsave works. Maybe I should have used
> dump-guest-memory

This is am unnecessary limitation caused by 'size:i' instead of
'size:o'.  Fixable.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 11:50             ` Markus Armbruster
@ 2015-10-22 18:11               ` Valerio Aimale
  2015-10-23  6:31                 ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 18:11 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, ehabkost, lcapitulino

On 10/22/15 5:50 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>>
>>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>>> valerio@aimale.com writes:
>>>>>>>
>>>>>>>> All-
>>>>>>>>
>>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>>
>>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>>
>>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>>
>>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>>> structure:
>>>>>>>>
>>>>>>>> struct request {
>>>>>>>>          uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>>>          uint64_t address;   // address to read from OR write to
>>>>>>>>          uint64_t length;    // number of bytes to read OR write
>>>>>>>> };
>>>>>>>>
>>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>>
>>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>>> indicates same (1 success, 0 failure).
>>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>>> garbage followed by the "it failed" byte?
>>>>>> Markus, that appear to be the case. However, I did not write the
>>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>>> person that wrote the protocol, did not want to bother with over
>>>>>> complicating things.
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>>> reading garbage was less than the price of writing a more complicated
>>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>>> might have done the same.
>>>>> All right, the interface is designed for *small* memory blocks then.
>>>>>
>>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>>> socket.  Small blocks could be done just fine in QMP.
>>>> The problem is speed. if one's analyzing the memory space of a running
>>>> process (physical and paged), libvmi will make a large number of small
>>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>>> overhead due to file open/write (server), file open/read/close/unlink
>>>> (client).
>>>>
>>>> Others have gone through the problem before me. It appears that
>>>> pmemsave and xp are significantly slower than reading memory using a
>>>> socket via pmemaccess.
>>> That they're slower isn't surprising, but I'd expect the cost of
>>> encoding a small block to be insiginificant compared to the cost of the
>>> network roundtrips.
>>>
>>> As block size increases, the space overhead of encoding will eventually
>>> bite.  But for that usage, the binary protocol appears ill-suited,
>>> unless the client can pretty reliably avoid read failure.  I haven't
>>> examined its failure modes, yet.
>>>
>>>> The following data is not mine, but it shows the time, in
>>>> milliseconds, required to resolve the content of a paged memory
>>>> address via socket (pmemaccess) , pmemsave and xp
>>>>
>>>> http://cl.ly/image/322a3s0h1V05
>>>>
>>>> Again, I did not produce those data points, they come from an old
>>>> libvmi thread.
>>> 90ms is a very long time.  What exactly was measured?
>> That is a fair question to ask. Unfortunately, I extracted  that data
>> plot from an old thread in some libvmi mailing list. I do not have the
>> data and code that produced it. Sifting through the thread, I can see
>> the code
>> was never published. I will take it upon myself to produce code that
>> compares timing - in a fair fashion - of libvmi doing an atomic
>> operation and a larger-scale operation (like listing running
>> processes)  via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
>> version of xp that returns byte streams of memory regions base64 or
>> base85 encoded in json strings. I'll publish results and code.
>>
>> However, given workload and life happening, it will be some time
>> before I complete that task.
> No problem.  I'd like to have your use case addressed, but there's no
> need for haste.

Thanks, Markus. Appreciate your help.
>
> [...]
>>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>>> 64bit VM's
>>>>>>>>
>>>>>>>> in qapi-schema.json
>>>>>>>>
>>>>>>>> from
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>>       'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>>       'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>>>> confusing.
>>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>>> over telnet this is what happens:
>>>>>>
>>>>>> $ telnet localhost 1234
>>>>>> Trying 127.0.0.1...
>>>>>> Connected to localhost.
>>>>>> Escape character is '^]'.
>>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>>> (qemu) help pmemsave
>>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>>> at 'addr' of size 'size'
>>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>>> Try "help pmemsave" for more information
>>>>>> (qemu) quit
>>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>>> no-op.
>>>>>
>>>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>>>> command is *independent* of the QMP command.  The implementation *uses*
>>>>> the QMP command.
>>>>>
>>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>>
>>>>>        { 'command': 'pmemsave',
>>>>>          'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>
>>>>> Its implementation is in cpus.c:
>>>>>
>>>>>        void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>>>                          Error **errp)
>>>>>
>>>>> Note the int64_t size.
>>>>>
>>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>>
>>>>>        {
>>>>>            .name       = "pmemsave",
>>>>>            .args_type  = "val:l,size:i,filename:s",
>>>>>            .params     = "addr size file",
>>>>>            .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>>>            .mhandler.cmd = hmp_pmemsave,
>>>>>        },
>>>>>
>>>>> Its implementation is in hmp.c:
>>>>>
>>>>>        void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>>>        {
>>>>>            uint32_t size = qdict_get_int(qdict, "size");
>>>>>            const char *filename = qdict_get_str(qdict, "filename");
>>>>>            uint64_t addr = qdict_get_int(qdict, "val");
>>>>>            Error *err = NULL;
>>>>>
>>>>>            qmp_pmemsave(addr, size, filename, &err);
>>>>>            hmp_handle_error(mon, &err);
>>>>>        }
>>>>>
>>>>> Note uint32_t size.
>>>>>
>>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>>> way pmemsave is implemented. There is a single function, and two
>>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>>> mimics pmemsave closely.
>>>>
>>>> However, if one wants to simply dump a memory region, via HMP for
>>>> human easy of use/debug/testing purposes, one cannot dump memory
>>>> regions that resides higher than 2^32-1
>>> Can you give an example?
>> Yes. I was trying to dump the full extent of physical memory of a VM
>> that has 8GB memory space (ballooned). I simply did this:
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
>> 'pmemsave' has failed: integer is for 32-bit values
>>
>> Maybe I misunderstood how pmemsave works. Maybe I should have used
>> dump-guest-memory
> This is am unnecessary limitation caused by 'size:i' instead of
> 'size:o'.  Fixable.
I think I tried changing size:i to size:l, but, I was still receiving 
the error.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-21 10:54         ` Markus Armbruster
  2015-10-21 15:50           ` Valerio Aimale
@ 2015-10-22 18:43           ` Valerio Aimale
  2015-10-22 18:54             ` Eric Blake
  2015-10-22 19:12           ` Eduardo Habkost
  2 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 18:43 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: lcapitulino, qemu-devel, ehabkost

On 10/21/15 4:54 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>> valerio@aimale.com writes:
>>>>>
>>>>>> All-
>>>>>>
>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>> introspect QEMU/KVM VMs.
>>>>>>
>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>
>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>
>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>> structure:
>>>>>>
>>>>>> struct request {
>>>>>>         uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>         uint64_t address;   // address to read from OR write to
>>>>>>         uint64_t length;    // number of bytes to read OR write
>>>>>> };
>>>>>>
>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>
>>>>>> The last bytes of a read operation response indicates success (1
>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>> indicates same (1 success, 0 failure).
>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>> garbage followed by the "it failed" byte?
>>>> Markus, that appear to be the case. However, I did not write the
>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>> person that wrote the protocol, did not want to bother with over
>>>> complicating things.
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> I'm thinking he assumed reads would be small in size and the price of
>>>> reading garbage was less than the price of writing a more complicated
>>>> protocol. I can see his point, confronted with the same problem, I
>>>> might have done the same.
>>> All right, the interface is designed for *small* memory blocks then.
>>>
>>> Makes me wonder why he needs a separate binary protocol on a separate
>>> socket.  Small blocks could be done just fine in QMP.
>> The problem is speed. if one's analyzing the memory space of a running
>> process (physical and paged), libvmi will make a large number of small
>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>> quite significant. xp has overhead due to encoding, and pmemsave has
>> overhead due to file open/write (server), file open/read/close/unlink
>> (client).
>>
>> Others have gone through the problem before me. It appears that
>> pmemsave and xp are significantly slower than reading memory using a
>> socket via pmemaccess.
> That they're slower isn't surprising, but I'd expect the cost of
> encoding a small block to be insiginificant compared to the cost of the
> network roundtrips.
>
> As block size increases, the space overhead of encoding will eventually
> bite.  But for that usage, the binary protocol appears ill-suited,
> unless the client can pretty reliably avoid read failure.  I haven't
> examined its failure modes, yet.
>
>> The following data is not mine, but it shows the time, in
>> milliseconds, required to resolve the content of a paged memory
>> address via socket (pmemaccess) , pmemsave and xp
>>
>> http://cl.ly/image/322a3s0h1V05
>>
>> Again, I did not produce those data points, they come from an old
>> libvmi thread.
> 90ms is a very long time.  What exactly was measured?
>
>> I think it might be conceivable that there could be a QMP command that
>> returns the content of an arbitrarily size memory region as a base64
>> or a base85 json string. It would still have both time- (due to
>> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
>> overhead, + json encoding/decoding overhead. It might still be the
>> case that socket would outperform such a command as well,
>> speed-vise. I don't think it would be any faster than xp.
> A special-purpose binary protocol over a dedicated socket will always do
> less than a QMP solution (ignoring foolishness like transmitting crap on
> read error the client is then expected to throw away).  The question is
> whether the difference in work translates to a worthwhile difference in
> performance.
>
> The larger question is actually whether we have an existing interface
> that can serve the libvmi's needs.  We've discussed monitor commands
> like xp, pmemsave, pmemread.  There's another existing interface: the
> GDB stub.  Have you considered it?
>
>> There's also a similar patch, floating around the internet, the uses
>> shared memory, instead of sockets, as inter-process communication
>> between libvmi and QEMU. I've never used that.
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process?  Eduardo, can -mem-path do such wild things?
Markus, your suggestion led to a lightbulb going off in my head.

What if there was a qmp command, say 'pmemmap' then when invoked, 
performs the following:

qmp_pmemmap( [...]) {

     char *template = "/tmp/QEM_mmap_XXXXXXX";
     int mmap_fd;
     uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming 
VM with 8GB RAM */);

     cpu_physical_memory_rw( (hwaddr) 0,  local_memspace , (hwaddr) 
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will 
discuss write later */);

    mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");

    mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | 
PROT_WRITE,  MAP_SHARED | MAP_ANON,  mmap_fd, (off_t) 0);

   /* etc */

}

pmemmap would return the following json

{
     'success' : 'true',
     'map_filename' : '/tmp/QEM_mmap_1234567'
}

the qmp client/caller, would then allocate a region of memory equal to 
the size of the file; mmap() the file '/tmp/QEM_mmap_1234567' into the 
region. It would then have (read, maybe write?) access to the full 
extent of the guest memory without making any other qmp call. I think it 
would be fast, and with low memory usage, as mmap() is pretty efficient.

Of course, there would be a 'pmemunmap' qmp commands that would perform 
the cleanup

/* etc. */
munmap()
cpu_physical_memory_unmap();
  /* etc. */

Would that work? Is mapping the full extent of the guest RAM too much to 
ask of cpu_physical_memory_rw()?
>
>>>>>> The socket API was written by the libvmi author and it works the with
>>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>>> point release.
>>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>>
>>>>> Can you explain why these won't do for your use case?
>>>> For people who do security analysis there are two use cases, static
>>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>>> can do static analysis. I.e. snapshotting a VM and see what was
>>>> happening at that point in time.
>>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>>
>>>> If you take a snapshot of two people exchanging a glass of water, and
>>>> you happen to take it at the very moment both persons have their hands
>>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>>> have a movie of the same scene, it's obvious who's the giver and who's
>>>> the receiver. Same use case.
>>> I understand the need for introspecting a running guest.  What exactly
>>> makes the existing commands unsuitable for that?
>> Speed. See discussion above.
>>>> More to the point, there's a host of C and python frameworks to
>>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>>> build on top of libvmi. I did not want to reinvent the wheel.
>>> Fair enough.
>>>
>>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>>> memory files."  What exactly is missing for KVM?
>> When they say they support kvm, what they really mean they support the
>> (retired, I understand) qemu-kvm fork via a patch that is provided in
>> the libvmi source tree. I think the most recent qem-kvm supported is
>> 1.6.0
>>
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> I wanted to bring support to the head revision of QEMU, to bring
>> libvmi level with more modern QEMU.
>>
>> Maybe the solution is simply to put this patch in the libvmi source
>> tree,  which I've already asked to do via pull request, leaving QEMU
>> alone.
>> However, the patch has to be updated at every QEMU point release. I
>> wanted to avoid that, if at all possible.
>>
>>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>>> contend that xen has better introspection support. In my case, I did
>>>> not want to bother with dedicating a full server to be a xen domain
>>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>>> an otherwise purposed server.
>>> I'm not at all against better introspection support in QEMU.  I'm just
>>> trying to understand the problem you're trying to solve with your
>>> patches.
>> What all users of libvmi would love to have is super high speed access
>> to VM physical memory as part of the QEMU source tree, and not
>> supported via a patch. Implemented as the QEMU owners see fit, as long
>> as it is blazing fast and easy accessed via client library or
>> inter-process communication.
> The use case makes sense to me, we just need to figure out how we want
> to serve it in QEMU.
>
>> My gut feeling is that it has to bypass QMP protocol/encoding/file
>> access/json to be fast, but, it is just a gut feeling - worth nothing.
> My gut feeling is that QMP should do fine in overhead compared to other
> solutions involving socket I/O as long as the data sizes are *small*.
> Latency might be an issue, though: QMP commands are processed from the
> main loop.  A dedicated server thread can be more responsive, but
> letting it write to shared resources could be "interesting".
>
>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>> 64bit VM's
>>>>>>
>>>>>> in qapi-schema.json
>>>>>>
>>>>>> from
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>>      'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>> ---
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>>      'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>> ---
>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>> confusing.
>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>> over telnet this is what happens:
>>>>
>>>> $ telnet localhost 1234
>>>> Trying 127.0.0.1...
>>>> Connected to localhost.
>>>> Escape character is '^]'.
>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>> (qemu) help pmemsave
>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>> at 'addr' of size 'size'
>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>> Try "help pmemsave" for more information
>>>> (qemu) quit
>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>> no-op.
>>>
>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>> command is *independent* of the QMP command.  The implementation *uses*
>>> the QMP command.
>>>
>>> QMP pmemsave is defined in qapi-schema.json as
>>>
>>>       { 'command': 'pmemsave',
>>>         'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>
>>> Its implementation is in cpus.c:
>>>
>>>       void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>                         Error **errp)
>>>
>>> Note the int64_t size.
>>>
>>> HMP pmemsave is defined in hmp-commands.hx as
>>>
>>>       {
>>>           .name       = "pmemsave",
>>>           .args_type  = "val:l,size:i,filename:s",
>>>           .params     = "addr size file",
>>>           .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>           .mhandler.cmd = hmp_pmemsave,
>>>       },
>>>
>>> Its implementation is in hmp.c:
>>>
>>>       void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>       {
>>>           uint32_t size = qdict_get_int(qdict, "size");
>>>           const char *filename = qdict_get_str(qdict, "filename");
>>>           uint64_t addr = qdict_get_int(qdict, "val");
>>>           Error *err = NULL;
>>>
>>>           qmp_pmemsave(addr, size, filename, &err);
>>>           hmp_handle_error(mon, &err);
>>>       }
>>>
>>> Note uint32_t size.
>>>
>>> Arguably, the QMP size argument should use 'size' (an alias for
>>> 'uint64'), and the HMP args_type should use 'size:o'.
>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>> way pmemsave is implemented. There is a single function, and two
>> points of entrance, one for HMP and one for QMP. I think pmemacess
>> mimics pmemsave closely.
>>
>> However, if one wants to simply dump a memory region, via HMP for
>> human easy of use/debug/testing purposes, one cannot dump memory
>> regions that resides higher than 2^32-1
> Can you give an example?
>
> [...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 18:43           ` Valerio Aimale
@ 2015-10-22 18:54             ` Eric Blake
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Blake @ 2015-10-22 18:54 UTC (permalink / raw)
  To: Valerio Aimale, Markus Armbruster; +Cc: qemu-devel, ehabkost, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On 10/22/2015 12:43 PM, Valerio Aimale wrote:

> 
> What if there was a qmp command, say 'pmemmap' then when invoked,
> performs the following:
> 
> qmp_pmemmap( [...]) {
> 
>     char *template = "/tmp/QEM_mmap_XXXXXXX";

Why not let the caller pass in the file name, rather than opening it
ourselves? But the idea of coordinating a file that both caller and qemu
mmap to the same guest memory view might indeed have merit.

[by the way, it's okay to trim messages to the relevant portions, rather
than making readers scroll through lots of quoted content to find the meat]

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-21 10:54         ` Markus Armbruster
  2015-10-21 15:50           ` Valerio Aimale
  2015-10-22 18:43           ` Valerio Aimale
@ 2015-10-22 19:12           ` Eduardo Habkost
  2015-10-22 19:57             ` Valerio Aimale
  2015-10-23  6:35             ` Markus Armbruster
  2 siblings, 2 replies; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-22 19:12 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Valerio Aimale, lcapitulino

On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
[...]
> > There's also a similar patch, floating around the internet, the uses
> > shared memory, instead of sockets, as inter-process communication
> > between libvmi and QEMU. I've never used that.
> 
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
> 
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process?  Eduardo, can -mem-path do such wild things?

It can't today, but just because it creates a temporary file inside
mem-path and unlinks it immediately after opening a file descriptor. We
could make memory-backend-file also accept a full filename as argument,
or add a mechanism to let QEMU send the open file descriptor to a QMP
client.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 19:12           ` Eduardo Habkost
@ 2015-10-22 19:57             ` Valerio Aimale
  2015-10-22 20:03               ` Eric Blake
  2015-10-22 21:47               ` Eduardo Habkost
  2015-10-23  6:35             ` Markus Armbruster
  1 sibling, 2 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 19:57 UTC (permalink / raw)
  To: Eduardo Habkost, Markus Armbruster; +Cc: qemu-devel, lcapitulino

On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
> [...]
>>> There's also a similar patch, floating around the internet, the uses
>>> shared memory, instead of sockets, as inter-process communication
>>> between libvmi and QEMU. I've never used that.
>> By the time you built a working IPC mechanism on top of shared memory,
>> you're often no better off than with AF_LOCAL sockets.
>>
>> Crazy idea: can we allocate guest memory in a way that support sharing
>> it with another process?  Eduardo, can -mem-path do such wild things?
> It can't today, but just because it creates a temporary file inside
> mem-path and unlinks it immediately after opening a file descriptor. We
> could make memory-backend-file also accept a full filename as argument,
> or add a mechanism to let QEMU send the open file descriptor to a QMP
> client.
>
Eduardo, would my "artisanal" idea of creating an mmap'ed image of the 
guest memory footprint work, augmented by Eric's suggestion of having 
the qmp client pass the filename?

qmp_pmemmap( [...]) {

     char *template = "/tmp/QEM_mmap_XXXXXXX";
     int mmap_fd;
     uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming 
VM with 8GB RAM */);

     cpu_physical_memory_rw( (hwaddr) 0,  local_memspace , (hwaddr) 
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will 
discuss write later */);

    mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");

    mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | 
PROT_WRITE,  MAP_SHARED | MAP_ANON,  mmap_fd, (off_t) 0);

   /* etc */

}

pmemmap would return the following json

{
     'success' : 'true',
     'map_filename' : '/tmp/QEM_mmap_1234567'
}

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 19:57             ` Valerio Aimale
@ 2015-10-22 20:03               ` Eric Blake
  2015-10-22 20:45                 ` Valerio Aimale
  2015-10-22 21:47               ` Eduardo Habkost
  1 sibling, 1 reply; 43+ messages in thread
From: Eric Blake @ 2015-10-22 20:03 UTC (permalink / raw)
  To: Valerio Aimale, Eduardo Habkost, Markus Armbruster
  Cc: qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

On 10/22/2015 01:57 PM, Valerio Aimale wrote:

> 
> pmemmap would return the following json
> 
> {
>     'success' : 'true',
>     'map_filename' : '/tmp/QEM_mmap_1234567'
> }

In general, it is better if the client controls the filename, and not
qemu.  This is because things like libvirt like to run qemu in a
highly-constrained environment, where the caller can pass in a file
descriptor that qemu cannot itself open().  So returning a filename is
pointless if the filename was already provided by the caller.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 20:03               ` Eric Blake
@ 2015-10-22 20:45                 ` Valerio Aimale
  0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 20:45 UTC (permalink / raw)
  To: Eric Blake, Eduardo Habkost, Markus Armbruster; +Cc: qemu-devel, lcapitulino

On 10/22/15 2:03 PM, Eric Blake wrote:
> On 10/22/2015 01:57 PM, Valerio Aimale wrote:
>
>> pmemmap would return the following json
>>
>> {
>>      'success' : 'true',
>>      'map_filename' : '/tmp/QEM_mmap_1234567'
>> }
> In general, it is better if the client controls the filename, and not
> qemu.  This is because things like libvirt like to run qemu in a
> highly-constrained environment, where the caller can pass in a file
> descriptor that qemu cannot itself open().  So returning a filename is
> pointless if the filename was already provided by the caller.
>
Eric, I absolutely and positively agree with you. I was just 
brainstorming. Consider my pseudo-C code as the mailing list analog of 
somebody scribbling on a white board and trying explain an idea.

I agree with you the file name should come from the qmp client.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 19:57             ` Valerio Aimale
  2015-10-22 20:03               ` Eric Blake
@ 2015-10-22 21:47               ` Eduardo Habkost
  2015-10-22 21:51                 ` Valerio Aimale
  1 sibling, 1 reply; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-22 21:47 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: lcapitulino, Markus Armbruster, qemu-devel

On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> >On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >>Valerio Aimale <valerio@aimale.com> writes:
> >[...]
> >>>There's also a similar patch, floating around the internet, the uses
> >>>shared memory, instead of sockets, as inter-process communication
> >>>between libvmi and QEMU. I've never used that.
> >>By the time you built a working IPC mechanism on top of shared memory,
> >>you're often no better off than with AF_LOCAL sockets.
> >>
> >>Crazy idea: can we allocate guest memory in a way that support sharing
> >>it with another process?  Eduardo, can -mem-path do such wild things?
> >It can't today, but just because it creates a temporary file inside
> >mem-path and unlinks it immediately after opening a file descriptor. We
> >could make memory-backend-file also accept a full filename as argument,
> >or add a mechanism to let QEMU send the open file descriptor to a QMP
> >client.
> >
> Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> memory footprint work, augmented by Eric's suggestion of having the qmp
> client pass the filename?

The code below doesn't make sense to me.

> 
> qmp_pmemmap( [...]) {
> 
>     char *template = "/tmp/QEM_mmap_XXXXXXX";
>     int mmap_fd;
>     uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming VM
> with 8GB RAM */);
> 
>     cpu_physical_memory_rw( (hwaddr) 0,  local_memspace , (hwaddr)
> 8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will
> discuss write later */);

Are you suggesting copying all the VM RAM contents to a whole new area?
Why?

> 
>    mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");
> 
>    mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ |
> PROT_WRITE,  MAP_SHARED | MAP_ANON,  mmap_fd, (off_t) 0);

I don't understand what you are trying to do here.

> 
>   /* etc */
> 
> }
> 
> pmemmap would return the following json
> 
> {
>     'success' : 'true',
>     'map_filename' : '/tmp/QEM_mmap_1234567'
> }
> 
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 21:47               ` Eduardo Habkost
@ 2015-10-22 21:51                 ` Valerio Aimale
  2015-10-23  8:25                   ` Daniel P. Berrange
  2015-10-23 18:55                   ` Eduardo Habkost
  0 siblings, 2 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 21:51 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: lcapitulino, Markus Armbruster, qemu-devel

On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
>> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>> Valerio Aimale <valerio@aimale.com> writes:
>>> [...]
>>>>> There's also a similar patch, floating around the internet, the uses
>>>>> shared memory, instead of sockets, as inter-process communication
>>>>> between libvmi and QEMU. I've never used that.
>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>> you're often no better off than with AF_LOCAL sockets.
>>>>
>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>> it with another process?  Eduardo, can -mem-path do such wild things?
>>> It can't today, but just because it creates a temporary file inside
>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>> could make memory-backend-file also accept a full filename as argument,
>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>> client.
>>>
>> Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
>> memory footprint work, augmented by Eric's suggestion of having the qmp
>> client pass the filename?
> The code below doesn't make sense to me.

Ok. What I am trying to do is to create a mmapped() memory area of the 
guest physical memory that can be shared between QEMU and an external 
process, such that the external process can read arbitrary locations of 
the qemu guest physical memory.
In short, I'm using mmap MAP_SHARED to share the guest memory area with 
a process that is external to QEMU

does it make better sense now?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 18:11               ` Valerio Aimale
@ 2015-10-23  6:31                 ` Markus Armbruster
  0 siblings, 0 replies; 43+ messages in thread
From: Markus Armbruster @ 2015-10-23  6:31 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: lcapitulino, qemu-devel, ehabkost

Valerio Aimale <valerio@aimale.com> writes:

> On 10/22/15 5:50 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/21/15 4:54 AM, Markus Armbruster wrote:
[...]
>>>> Can you give an example?
>>> Yes. I was trying to dump the full extent of physical memory of a VM
>>> that has 8GB memory space (ballooned). I simply did this:
>>>
>>> $ telnet localhost 1234
>>> Trying 127.0.0.1...
>>> Connected to localhost.
>>> Escape character is '^]'.
>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
>>> 'pmemsave' has failed: integer is for 32-bit values
>>>
>>> Maybe I misunderstood how pmemsave works. Maybe I should have used
>>> dump-guest-memory
>> This is am unnecessary limitation caused by 'size:i' instead of
>> 'size:o'.  Fixable.
> I think I tried changing size:i to size:l, but, I was still receiving
> the error.

I should have a look.  HMP is relatively low priority for me; feel free
to remind me in a week or two.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 19:12           ` Eduardo Habkost
  2015-10-22 19:57             ` Valerio Aimale
@ 2015-10-23  6:35             ` Markus Armbruster
  2015-10-23  8:18               ` Daniel P. Berrange
                                 ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Markus Armbruster @ 2015-10-23  6:35 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: qemu-devel, Valerio Aimale, lcapitulino

Eduardo Habkost <ehabkost@redhat.com> writes:

> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
> [...]
>> > There's also a similar patch, floating around the internet, the uses
>> > shared memory, instead of sockets, as inter-process communication
>> > between libvmi and QEMU. I've never used that.
>> 
>> By the time you built a working IPC mechanism on top of shared memory,
>> you're often no better off than with AF_LOCAL sockets.
>> 
>> Crazy idea: can we allocate guest memory in a way that support sharing
>> it with another process?  Eduardo, can -mem-path do such wild things?
>
> It can't today, but just because it creates a temporary file inside
> mem-path and unlinks it immediately after opening a file descriptor. We
> could make memory-backend-file also accept a full filename as argument,
> or add a mechanism to let QEMU send the open file descriptor to a QMP
> client.

Valerio, would an command line option to share guest memory suffice, or
does it have to be a monitor command?  If the latter, why?

Eduardo, I'm not sure writing to guest memory behind TCG's back will
work.  Do you know?

Will it work with KVM?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23  6:35             ` Markus Armbruster
@ 2015-10-23  8:18               ` Daniel P. Berrange
  2015-10-23 14:48                 ` Valerio Aimale
  2015-10-23 14:44               ` Valerio Aimale
  2015-10-23 19:24               ` Eduardo Habkost
  2 siblings, 1 reply; 43+ messages in thread
From: Daniel P. Berrange @ 2015-10-23  8:18 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: lcapitulino, Eduardo Habkost, Valerio Aimale, qemu-devel

On Fri, Oct 23, 2015 at 08:35:15AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >> Valerio Aimale <valerio@aimale.com> writes:
> > [...]
> >> > There's also a similar patch, floating around the internet, the uses
> >> > shared memory, instead of sockets, as inter-process communication
> >> > between libvmi and QEMU. I've never used that.
> >> 
> >> By the time you built a working IPC mechanism on top of shared memory,
> >> you're often no better off than with AF_LOCAL sockets.
> >> 
> >> Crazy idea: can we allocate guest memory in a way that support sharing
> >> it with another process?  Eduardo, can -mem-path do such wild things?
> >
> > It can't today, but just because it creates a temporary file inside
> > mem-path and unlinks it immediately after opening a file descriptor. We
> > could make memory-backend-file also accept a full filename as argument,
> > or add a mechanism to let QEMU send the open file descriptor to a QMP
> > client.
> 
> Valerio, would an command line option to share guest memory suffice, or
> does it have to be a monitor command?  If the latter, why?

IIUC, libvmi wants to be able to connect to arbitrary pre-existing
running KVM instances on the host. As such I think it cannot assume
anything about the way they have been started, so requiring they
be booted with a special command line arg looks impractical for this
scenario.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 21:51                 ` Valerio Aimale
@ 2015-10-23  8:25                   ` Daniel P. Berrange
  2015-10-23 19:00                     ` Eduardo Habkost
  2015-10-23 18:55                   ` Eduardo Habkost
  1 sibling, 1 reply; 43+ messages in thread
From: Daniel P. Berrange @ 2015-10-23  8:25 UTC (permalink / raw)
  To: Valerio Aimale
  Cc: qemu-devel, Markus Armbruster, Eduardo Habkost, lcapitulino

On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> >On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> >>On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> >>>On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >>>>Valerio Aimale <valerio@aimale.com> writes:
> >>>[...]
> >>>>>There's also a similar patch, floating around the internet, the uses
> >>>>>shared memory, instead of sockets, as inter-process communication
> >>>>>between libvmi and QEMU. I've never used that.
> >>>>By the time you built a working IPC mechanism on top of shared memory,
> >>>>you're often no better off than with AF_LOCAL sockets.
> >>>>
> >>>>Crazy idea: can we allocate guest memory in a way that support sharing
> >>>>it with another process?  Eduardo, can -mem-path do such wild things?
> >>>It can't today, but just because it creates a temporary file inside
> >>>mem-path and unlinks it immediately after opening a file descriptor. We
> >>>could make memory-backend-file also accept a full filename as argument,
> >>>or add a mechanism to let QEMU send the open file descriptor to a QMP
> >>>client.
> >>>
> >>Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> >>memory footprint work, augmented by Eric's suggestion of having the qmp
> >>client pass the filename?
> >The code below doesn't make sense to me.
> 
> Ok. What I am trying to do is to create a mmapped() memory area of the guest
> physical memory that can be shared between QEMU and an external process,
> such that the external process can read arbitrary locations of the qemu
> guest physical memory.
> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
> process that is external to QEMU

I wonder if it is possible for you get access to the guest memory via
sysfs instead, by simply accessing /proc/$PID/mem  avoiding the need for
any special help from QEMU. /proc/$PID/maps can be used to find the
offset of the guest memory region too, but looking for the map region
that has the size that matches guest RAM size.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23  6:35             ` Markus Armbruster
  2015-10-23  8:18               ` Daniel P. Berrange
@ 2015-10-23 14:44               ` Valerio Aimale
  2015-10-23 14:56                 ` Eric Blake
  2015-10-23 19:24               ` Eduardo Habkost
  2 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 14:44 UTC (permalink / raw)
  To: Markus Armbruster, Eduardo Habkost; +Cc: qemu-devel, lcapitulino



On 10/23/15 12:35 AM, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
>
>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>> [...]
>>>> There's also a similar patch, floating around the internet, the uses
>>>> shared memory, instead of sockets, as inter-process communication
>>>> between libvmi and QEMU. I've never used that.
>>> By the time you built a working IPC mechanism on top of shared memory,
>>> you're often no better off than with AF_LOCAL sockets.
>>>
>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>> it with another process?  Eduardo, can -mem-path do such wild things?
>> It can't today, but just because it creates a temporary file inside
>> mem-path and unlinks it immediately after opening a file descriptor. We
>> could make memory-backend-file also accept a full filename as argument,
>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>> client.
> Valerio, would an command line option to share guest memory suffice, or
> does it have to be a monitor command?  If the latter, why?
As Daniel points out, later in the thread, libvmi knows about QEMU VMs 
only via libvirt/virsh.

Thus, a command line option to share guest memory would work only if 
there was an info qmp command, info-guestmaps that would return 
something like

{
     'enabled' : 'true',
     'filename' : '/path/to/the/guest/memory/map'
}

so that libvmi can find the map.

Libvmi dependence on virsh is so strict, that libvmi does not even know 
if the QEMU VM has an open qmp unix socket or inet socket, to send 
commands through. Thus, libvmi sends qmp commands (to query registers, 
as an example) via

virsh qemu-monitor-command Windows10B '{"execute": 
"human-monitor-command", "arguments": {"command-line": "info registers"}}'

It would be nice if there was a qmp command to query if there are 
qmp/hmp sockets open, info-sockets that would return something like

{
     'unix' : {
                     'enabled': 'true',
                     'address': '/path/to/socket/'
             },

    'inet'  : {
                     'enabled': 'true',
                     'address': '1.2.3.4:5678'
     }
}

so that libvmi can find the rendezvous unix/inet address and sends 
commands through that. As of now, each qmp commands requires a popen() 
that forks virsh, which compounds to the slowness.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23  8:18               ` Daniel P. Berrange
@ 2015-10-23 14:48                 ` Valerio Aimale
  0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 14:48 UTC (permalink / raw)
  To: Daniel P. Berrange, Markus Armbruster
  Cc: lcapitulino, Eduardo Habkost, qemu-devel



On 10/23/15 2:18 AM, Daniel P. Berrange wrote:
> [...]
> It can't today, but just because it creates a temporary file inside
> mem-path and unlinks it immediately after opening a file descriptor. We
> could make memory-backend-file also accept a full filename as argument,
> or add a mechanism to let QEMU send the open file descriptor to a QMP
> client.
>> Valerio, would an command line option to share guest memory suffice, or
>> does it have to be a monitor command?  If the latter, why?
> IIUC, libvmi wants to be able to connect to arbitrary pre-existing
> running KVM instances on the host. As such I think it cannot assume
> anything about the way they have been started, so requiring they
> be booted with a special command line arg looks impractical for this
> scenario.
>
> Regards,
> Daniel
Daniel, you are correct. libvmi knows of QEMU/KVM VMs only via 
libvirt/virsh. The scenario would work if libvmi was able to query, via 
qmp command, the path of the shared memory map.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23 14:44               ` Valerio Aimale
@ 2015-10-23 14:56                 ` Eric Blake
  2015-10-23 15:03                   ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Blake @ 2015-10-23 14:56 UTC (permalink / raw)
  To: Valerio Aimale, Markus Armbruster, Eduardo Habkost
  Cc: qemu-devel, lcapitulino

[-- Attachment #1: Type: text/plain, Size: 1419 bytes --]

On 10/23/2015 08:44 AM, Valerio Aimale wrote:

> 
> Libvmi dependence on virsh is so strict, that libvmi does not even know
> if the QEMU VM has an open qmp unix socket or inet socket, to send
> commands through. Thus, libvmi sends qmp commands (to query registers,
> as an example) via
> 
> virsh qemu-monitor-command Windows10B '{"execute":
> "human-monitor-command", "arguments": {"command-line": "info registers"}}'

This is an unsupported back door of libvirt; you should really also
consider enhancing libvirt to add a formal API to expose this
information so that you don't have to resort to the monitor back door.
But that's a topic for the libvirt list.


> 
> so that libvmi can find the rendezvous unix/inet address and sends
> commands through that. As of now, each qmp commands requires a popen()
> that forks virsh, which compounds to the slowness.

No, don't blame virsh for your slowness.  Write your own C program that
links against libvirt.so, and which holds and reuses a persistent
connection, using the same libvirt APIs as would be used by virsh.  All
the overhead of spawning a shell to spawn virsh to open a fresh
connection for each command will go away.  Any solution that uses
popen() to virsh is _screaming_ to be rewritten to use libvirt.so natively.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23 14:56                 ` Eric Blake
@ 2015-10-23 15:03                   ` Valerio Aimale
  0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 15:03 UTC (permalink / raw)
  To: Eric Blake, Markus Armbruster, Eduardo Habkost; +Cc: qemu-devel, lcapitulino



On 10/23/15 8:56 AM, Eric Blake wrote:
> On 10/23/2015 08:44 AM, Valerio Aimale wrote:
>
>> Libvmi dependence on virsh is so strict, that libvmi does not even know
>> if the QEMU VM has an open qmp unix socket or inet socket, to send
>> commands through. Thus, libvmi sends qmp commands (to query registers,
>> as an example) via
>>
>> virsh qemu-monitor-command Windows10B '{"execute":
>> "human-monitor-command", "arguments": {"command-line": "info registers"}}'
> This is an unsupported back door of libvirt; you should really also
> consider enhancing libvirt to add a formal API to expose this
> information so that you don't have to resort to the monitor back door.
> But that's a topic for the libvirt list.
>
>> so that libvmi can find the rendezvous unix/inet address and sends
>> commands through that. As of now, each qmp commands requires a popen()
>> that forks virsh, which compounds to the slowness.
> No, don't blame virsh for your slowness.  Write your own C program that
> links against libvirt.so, and which holds and reuses a persistent
> connection, using the same libvirt APIs as would be used by virsh.  All
> the overhead of spawning a shell to spawn virsh to open a fresh
> connection for each command will go away.  Any solution that uses
> popen() to virsh is _screaming_ to be rewritten to use libvirt.so natively.
>
Eric, good points. Libvmi was written quite some time ago; that libvirt 
api might not even have been there. When we agree on an access method to 
the guest mem, I will rewrite those legacy parts. I'm not the author of 
libvmi. I'm just trying to make it better, when accessing QEMU/KVM VMs

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-22 21:51                 ` Valerio Aimale
  2015-10-23  8:25                   ` Daniel P. Berrange
@ 2015-10-23 18:55                   ` Eduardo Habkost
  2015-10-23 19:08                     ` Valerio Aimale
  1 sibling, 1 reply; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-23 18:55 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: lcapitulino, Markus Armbruster, qemu-devel

On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> >On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> >>On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> >>>On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >>>>Valerio Aimale <valerio@aimale.com> writes:
> >>>[...]
> >>>>>There's also a similar patch, floating around the internet, the uses
> >>>>>shared memory, instead of sockets, as inter-process communication
> >>>>>between libvmi and QEMU. I've never used that.
> >>>>By the time you built a working IPC mechanism on top of shared memory,
> >>>>you're often no better off than with AF_LOCAL sockets.
> >>>>
> >>>>Crazy idea: can we allocate guest memory in a way that support sharing
> >>>>it with another process?  Eduardo, can -mem-path do such wild things?
> >>>It can't today, but just because it creates a temporary file inside
> >>>mem-path and unlinks it immediately after opening a file descriptor. We
> >>>could make memory-backend-file also accept a full filename as argument,
> >>>or add a mechanism to let QEMU send the open file descriptor to a QMP
> >>>client.
> >>>
> >>Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> >>memory footprint work, augmented by Eric's suggestion of having the qmp
> >>client pass the filename?
> >The code below doesn't make sense to me.
> 
> Ok. What I am trying to do is to create a mmapped() memory area of the guest
> physical memory that can be shared between QEMU and an external process,
> such that the external process can read arbitrary locations of the qemu
> guest physical memory.
> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
> process that is external to QEMU
> 
> does it make better sense now?

I think you are confused about what mmap() does. It will create a new
mapping into the process address space, containing the data from an
existing file, not the other way around.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23  8:25                   ` Daniel P. Berrange
@ 2015-10-23 19:00                     ` Eduardo Habkost
  0 siblings, 0 replies; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-23 19:00 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: qemu-devel, Markus Armbruster, Valerio Aimale, lcapitulino

On Fri, Oct 23, 2015 at 09:25:03AM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
> > On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> > >On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> > >>On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> > >>>On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> > >>>>Valerio Aimale <valerio@aimale.com> writes:
> > >>>[...]
> > >>>>>There's also a similar patch, floating around the internet, the uses
> > >>>>>shared memory, instead of sockets, as inter-process communication
> > >>>>>between libvmi and QEMU. I've never used that.
> > >>>>By the time you built a working IPC mechanism on top of shared memory,
> > >>>>you're often no better off than with AF_LOCAL sockets.
> > >>>>
> > >>>>Crazy idea: can we allocate guest memory in a way that support sharing
> > >>>>it with another process?  Eduardo, can -mem-path do such wild things?
> > >>>It can't today, but just because it creates a temporary file inside
> > >>>mem-path and unlinks it immediately after opening a file descriptor. We
> > >>>could make memory-backend-file also accept a full filename as argument,
> > >>>or add a mechanism to let QEMU send the open file descriptor to a QMP
> > >>>client.
> > >>>
> > >>Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> > >>memory footprint work, augmented by Eric's suggestion of having the qmp
> > >>client pass the filename?
> > >The code below doesn't make sense to me.
> > 
> > Ok. What I am trying to do is to create a mmapped() memory area of the guest
> > physical memory that can be shared between QEMU and an external process,
> > such that the external process can read arbitrary locations of the qemu
> > guest physical memory.
> > In short, I'm using mmap MAP_SHARED to share the guest memory area with a
> > process that is external to QEMU
> 
> I wonder if it is possible for you get access to the guest memory via
> sysfs instead, by simply accessing /proc/$PID/mem  avoiding the need for
> any special help from QEMU. /proc/$PID/maps can be used to find the
> offset of the guest memory region too, but looking for the map region
> that has the size that matches guest RAM size.

If libvmi needs to work with existing VMs (without reconfiguring and
restarting them), this may be the only way we could let it access guest
memory directly.

But I wouldn't trust the method of simply looking for the map region
that has the right size. This would be making many assumptions about how
exactly QEMU allocates guest RAM internally. Also, it would require
libvmi to ptrace-attach to QEMU first.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23 18:55                   ` Eduardo Habkost
@ 2015-10-23 19:08                     ` Valerio Aimale
  2015-10-26  9:09                       ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 19:08 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: lcapitulino, Markus Armbruster, qemu-devel

On 10/23/15 12:55 PM, Eduardo Habkost wrote:
> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
>> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
>>> On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
>>>> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>>>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>> [...]
>>>>>>> There's also a similar patch, floating around the internet, the uses
>>>>>>> shared memory, instead of sockets, as inter-process communication
>>>>>>> between libvmi and QEMU. I've never used that.
>>>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>>>> you're often no better off than with AF_LOCAL sockets.
>>>>>>
>>>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>>>> it with another process?  Eduardo, can -mem-path do such wild things?
>>>>> It can't today, but just because it creates a temporary file inside
>>>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>>>> could make memory-backend-file also accept a full filename as argument,
>>>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>>>> client.
>>>>>
>>>> Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
>>>> memory footprint work, augmented by Eric's suggestion of having the qmp
>>>> client pass the filename?
>>> The code below doesn't make sense to me.
>> Ok. What I am trying to do is to create a mmapped() memory area of the guest
>> physical memory that can be shared between QEMU and an external process,
>> such that the external process can read arbitrary locations of the qemu
>> guest physical memory.
>> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
>> process that is external to QEMU
>>
>> does it make better sense now?
> I think you are confused about what mmap() does. It will create a new
> mapping into the process address space, containing the data from an
> existing file, not the other way around.
>
Eduardo, I think it would be a common rule of politeness not to pass any 
judgement on a person that you don't know, but for some texts in a 
mailing list. I think I understand how mmap() works, and very well.

Participating is this discussion has been a struggle for me. For the 
good of the libvmi users, I have been trying to ignore the judgements, 
the comments and so on. But, alas, I throw my hands up in the air, and I 
surrender.

I think libvmi can live, as it has for the past years, by patching the 
QEMU source tree on as needed basis, and keeping the patch in the libvmi 
source tree, without disturbing any further the QEMU community.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23  6:35             ` Markus Armbruster
  2015-10-23  8:18               ` Daniel P. Berrange
  2015-10-23 14:44               ` Valerio Aimale
@ 2015-10-23 19:24               ` Eduardo Habkost
  2015-10-23 20:02                 ` Richard Henderson
  2015-11-02 12:55                 ` Paolo Bonzini
  2 siblings, 2 replies; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-23 19:24 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Valerio Aimale,
	lcapitulino

On Fri, Oct 23, 2015 at 08:35:15AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >> Valerio Aimale <valerio@aimale.com> writes:
> > [...]
> >> > There's also a similar patch, floating around the internet, the uses
> >> > shared memory, instead of sockets, as inter-process communication
> >> > between libvmi and QEMU. I've never used that.
> >> 
> >> By the time you built a working IPC mechanism on top of shared memory,
> >> you're often no better off than with AF_LOCAL sockets.
> >> 
> >> Crazy idea: can we allocate guest memory in a way that support sharing
> >> it with another process?  Eduardo, can -mem-path do such wild things?
> >
> > It can't today, but just because it creates a temporary file inside
> > mem-path and unlinks it immediately after opening a file descriptor. We
> > could make memory-backend-file also accept a full filename as argument,
> > or add a mechanism to let QEMU send the open file descriptor to a QMP
> > client.
> 
> Valerio, would an command line option to share guest memory suffice, or
> does it have to be a monitor command?  If the latter, why?
> 
> Eduardo, I'm not sure writing to guest memory behind TCG's back will
> work.  Do you know?

I don't know. I guess it may possibly surprise TCG depending on how some
operations are implemented, but it sounds unlikely. CCing Richard.

> 
> Will it work with KVM?

I always assumed it would work, but CCing Paolo to see if there are any
pitfalls.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23 19:24               ` Eduardo Habkost
@ 2015-10-23 20:02                 ` Richard Henderson
  2015-11-02 12:55                 ` Paolo Bonzini
  1 sibling, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2015-10-23 20:02 UTC (permalink / raw)
  To: Eduardo Habkost, Markus Armbruster
  Cc: Paolo Bonzini, qemu-devel, Valerio Aimale, lcapitulino

On 10/23/2015 09:24 AM, Eduardo Habkost wrote:
> On Fri, Oct 23, 2015 at 08:35:15AM +0200, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>
>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>> Valerio Aimale <valerio@aimale.com> writes:
>>> [...]
>>>>> There's also a similar patch, floating around the internet, the uses
>>>>> shared memory, instead of sockets, as inter-process communication
>>>>> between libvmi and QEMU. I've never used that.
>>>>
>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>> you're often no better off than with AF_LOCAL sockets.
>>>>
>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>> it with another process?  Eduardo, can -mem-path do such wild things?
>>>
>>> It can't today, but just because it creates a temporary file inside
>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>> could make memory-backend-file also accept a full filename as argument,
>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>> client.
>>
>> Valerio, would an command line option to share guest memory suffice, or
>> does it have to be a monitor command?  If the latter, why?
>>
>> Eduardo, I'm not sure writing to guest memory behind TCG's back will
>> work.  Do you know?
>
> I don't know. I guess it may possibly surprise TCG depending on how some
> operations are implemented, but it sounds unlikely. CCing Richard.

Writing to guest memory will work, in that the guest will see the changes.  The 
harder part is synchronization.  Here you'll face all the same problems that 
are currently being addressed in the multi-threaded tcg patch sets.


r~

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23 19:08                     ` Valerio Aimale
@ 2015-10-26  9:09                       ` Markus Armbruster
  2015-10-26 17:37                         ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-26  9:09 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: qemu-devel, Eduardo Habkost, lcapitulino

Valerio Aimale <valerio@aimale.com> writes:

> On 10/23/15 12:55 PM, Eduardo Habkost wrote:
>> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
>>> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
>>>> On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
>>>>> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>>>>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>>> [...]
>>>>>>>> There's also a similar patch, floating around the internet, the uses
>>>>>>>> shared memory, instead of sockets, as inter-process communication
>>>>>>>> between libvmi and QEMU. I've never used that.
>>>>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>>>>> you're often no better off than with AF_LOCAL sockets.
>>>>>>>
>>>>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>>>>> it with another process?  Eduardo, can -mem-path do such wild things?
>>>>>> It can't today, but just because it creates a temporary file inside
>>>>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>>>>> could make memory-backend-file also accept a full filename as argument,
>>>>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>>>>> client.
>>>>>>
>>>>> Eduardo, would my "artisanal" idea of creating an mmap'ed image
>>>>> of the guest
>>>>> memory footprint work, augmented by Eric's suggestion of having the qmp
>>>>> client pass the filename?
>>>> The code below doesn't make sense to me.
>>> Ok. What I am trying to do is to create a mmapped() memory area of the guest
>>> physical memory that can be shared between QEMU and an external process,
>>> such that the external process can read arbitrary locations of the qemu
>>> guest physical memory.
>>> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
>>> process that is external to QEMU
>>>
>>> does it make better sense now?
>> I think you are confused about what mmap() does. It will create a new
>> mapping into the process address space, containing the data from an
>> existing file, not the other way around.
>>
> Eduardo, I think it would be a common rule of politeness not to pass
> any judgement on a person that you don't know, but for some texts in a
> mailing list. I think I understand how mmap() works, and very well.
>
> Participating is this discussion has been a struggle for me. For the
> good of the libvmi users, I have been trying to ignore the judgements,
> the comments and so on. But, alas, I throw my hands up in the air, and
> I surrender.

I'm sorry we exceeded your tolerance for frustration.  This mailing list
can be tough.  We try to be welcoming (believe it or not), but we too
often fail (okay, that part is easily believable).

To be honest, I had difficulties understanding your explanation, and
ended up guessing.  I figure Eduardo did the same, and guessed
incorrectly.  There but for the grace of God go I.

> I think libvmi can live, as it has for the past years, by patching the
> QEMU source tree on as needed basis, and keeping the patch in the
> libvmi source tree, without disturbing any further the QEMU community.

I'm sure libvmi can continue to require a patched QEMU, but I'm equally
sure getting its needs satisfied out of the box would be better for all.
To get that done, we need to understand the problem, and map the
solution space.

So let me try to summarize the thread, and what I've learned from it so
far.  Valerio, if you could correct misunderstandings, I'd be much
obliged.

LibVMI is a C library with Python bindings that makes it easy to monitor
the low-level details of a running virtual machine by viewing its
memory, trapping on hardware events, and accessing the vCPU registers.
This is called virtual machine introspection.
[Direct quote from http://libvmi.com/]

For that purpose, LibVMI needs (among other things) sufficiently fast
means to read and write guest memory.

Its existing solution is a non-upstream patch that adds a new, simple
protocol for reading and writing physical guest memory over TCP, and
monitor commands to start this service.

This thread is in reply to Valerio's attempt to upstream this patch.
Good move.

The usual questions for feature requests apply:

1. Is this a use case we want to serve?

   Unreserved yes.  Supporting virtual machine introspection with LibVMI
   makes sense.

2. Can it be served by existing guest introspection interfaces?

   Not quite clear, yet.

   LibVMI interacts with QEMU/KVM virtual machines via libvirt.  We want
   to be able to start and stop introspecting running virtual machines
   managed by libvirt.  Rules out solutions that require QEMU to be
   started with special command line options.  We really want QMP
   commands.  HMP commands would work, but HMP is not a stable
   interface.  It's fine for prototyping, of course.

   Interfaces discussed include the following monitor commands:

   * x, xp

     There's overhead for encoding / decoding.  LibVMI developers
     apparently have found it too slow.  They measured 90ms, which very
     slow indeed.  Turns out LibVMI goes through virsh every time!  I
     suspect the overhead starting virsh dwarves everything else several
     times over.  Therefore, we don't have an idea on this method's real
     overhead, yet.

     Transferring large blocks through the monitor connection can be
     problematic.  The monitor is control plane, bulk data should go
     through a data plane.  Not sure whether this is an issue for
     LibVMI's usage.

     x and xp are only in HMP.

     They cover only reading.

   * memsave, pmemsave

     Similar to x, xp, except here we use the file system as data plane.
     Additionally, we trade encoding / decoding overhead for temporary
     file handling overhead.

     Cover only reading.

   * dump-guest-memory

     More powerful (and more complex) than memsave, pmemsave, but geared
     towards a different use case: taking a crash dump of a running VM
     for static analysis.

     Covers only reading.

     Dumps in ELF or compressed kdump format.

     Supports dump to file descriptor, which could perhaps be used to
     avoid temporary files.

   * gdbserver

     This starts a server for the GDB Remote Serial Protocol on a TCP
     port.  This is for debugging the *guest*, not QEMU.  Can do much
     more than read and write memory.

     README.rst in the LibVMI source tree suggests

     - LibVMI can already use this interface, but it's apparently slower
       than using its non-upstream patch.  How much?

     - It requires the user to configure his VMs to be started with QEMU
       command line option -s, which means you can't introspect a
       running VM you didn't prepare for it from the start.  Easy enough
       to improve: use the monitor command!

     gdbserver is only in HMP, probably because it's viewed as a
     debugging tool for human use.  LibVMI would be a programmatic user,
     so adding gdbserver to QMP could be justified.

3. Assuming existing interfaces won't do, how could a new one look like?

   * Special purpose TCP protocol, QMP command to start/stop the service

     This is the existing non-upstream solution.

     We can quibble over the protocol, e.g. its weird handling of read
     errors, but that's detail

   * Share guest memory with LibVMI somehow

     Still in the "crazy idea" stage.  Fish the memory out of
     /proc/$PID/mem?  Create a file QEMU and LibVMI can mmap?

     Writing has same synchronization problems as with multi-threaded
     TCG.  These are being addressed, but I have now idea how the
     solutions will translate.

I'm afraid this got a bit long.  Thanks for reading this far.

I hope we can work together and find a solution that satisfies LibVMI.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-26  9:09                       ` Markus Armbruster
@ 2015-10-26 17:37                         ` Valerio Aimale
  2015-10-26 17:52                           ` Eduardo Habkost
  0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-26 17:37 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Eduardo Habkost, lcapitulino

On 10/26/15 3:09 AM, Markus Armbruster wrote:
> [...]
>
>> Eduardo, I think it would be a common rule of politeness not to pass
>> any judgement on a person that you don't know, but for some texts in a
>> mailing list. I think I understand how mmap() works, and very well.
>>
>> Participating is this discussion has been a struggle for me. For the
>> good of the libvmi users, I have been trying to ignore the judgements,
>> the comments and so on. But, alas, I throw my hands up in the air, and
>> I surrender.
> I'm sorry we exceeded your tolerance for frustration.  This mailing list
> can be tough.  We try to be welcoming (believe it or not), but we too
> often fail (okay, that part is easily believable).
>
> To be honest, I had difficulties understanding your explanation, and
> ended up guessing.  I figure Eduardo did the same, and guessed
> incorrectly.  There but for the grace of God go I.
Well, I did scribble my C sample excerpt too fast. Participating in 
mailing list is not part of my job description - I was short on time, I 
admit to that. However, there is a big difference in saying "I do no 
understand your explanation, please try again" and saying "you're 
confused about mmap()"

I was trying to advocate the use of a shared mmap'ed region. The sharing 
would be two-ways (RW for both) between the QEMU virtualizer and the 
libvmi process. I envision that there could be a QEMU command line 
argument, such as "--mmap-guest-memory <filename>" Understand that Eric 
feels strongly the libvmi client should own the file name - I have not 
forgotten that. When that command line argument is given, as part of the 
guest initialization, QEMU creates a file of size equal to the size of 
the guest memory containing all zeros, mmaps that file to the guest 
memory with  PROT_READ|PROT_WRITE and MAP_FILE|MAP_SHARED, then starts 
the guest. And, if at all possible, makes the filename querable via qmp 
and/or hmp, so that the filename of the mmap would not need to be 
maintained in two different places, leading to maintenance nightmares.

Shared mmaped regions can be used as inter-process communication, here's 
a quick and dirty example:

p1.c
---
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <errno.h>
#include <signal.h>

void handle_signal(int signal);

/* sorry, for ease of development I need these to be global */
int fh;
char *p;

void handle_signal(int signal) {
     const char *signal_name;
     sigset_t pending;

     switch (signal) {
         case SIGHUP:
             signal_name = "SIGHUP";
             fprintf(stdout, "Process 1 -- Map now contains: %s\n", p);
             munmap(p, sysconf(_SC_PAGE_SIZE) );
             close(fh);
             exit(0);
             break;
         default:
             fprintf(stderr, "Caught wrong signal: %d\n", signal);
             return;
     }
}


void main(int argc, char **argv) {
         struct sigaction sa;

         sa.sa_handler = &handle_signal;
         sa.sa_flags = SA_RESTART;
         sigfillset(&sa.sa_mask);
         if (sigaction(SIGHUP, &sa, NULL) == -1) {
                 perror("Error: cannot handle SIGHUP");
                 exit(1);
         }

         if ( (fh = open("shared.map", O_RDWR | O_CREAT, S_IRWXU)) ) {
                 p = mmap(NULL, sysconf(_SC_PAGE_SIZE), 
PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED, fh, (off_t) 0);

                 if (p == MAP_FAILED) { printf("poop, didn't map: 
%s\n",strerror(errno)); close(fh); exit(1);}

                 p[0] = 0xcc;

                 fprintf(stdout, "Process 1 -- Writing to map:   All 
your bases are belong to us.\n");
                 sprintf( (char*) p, "All your bases are belong to us.");

                 while(1);

         }

}
---

p2.c
---
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <errno.h>
#include <signal.h>

void main(int argc, char **argv) {
         int fh;
         void *p;
         int pid;

         pid = atoi(argv[1]);

         sleep(1);

         if ( (fh = open("shared.map", O_RDWR, S_IRWXU)) ) {
                 p = mmap(NULL, sysconf(_SC_PAGE_SIZE), 
PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FILE,
                   fh, (off_t) 0);

                 printf("Process 2 -- Map now contains: %s\n", (char*)p);
                 printf("Process 2 -- Writing to map:   All your bases 
*NOW* belong to us.\n");
                 fflush(stdout);
                 sprintf( (char*) p, "All your bases *NOW* belong to us.");

                 kill(pid, SIGHUP);

                 sleep(3);

                 munmap(p, sysconf(_SC_PAGE_SIZE));
                 close(fh);
         }

}
---

if I run both, in bash, as:

rm -f shared.map ; \
gcc -o p1 p1.c ; \
gcc -o p2 p2.c ; \
for i in ` seq 1 `getconf PAGESIZE``; do echo -e -n "\0" > shared.map ; 
done ; \
./p1 & ./p2 $!

I get the following output:

$ rm -f shared.map ; \
 > gcc -o p1 p1.c ; \
 > gcc -o p2 p2.c ; \
 > for i in ` seq 1 `getconf PAGESIZE``; do echo -e -n "\0" > shared.map 
; done ; \
 > ./p1 & ./p2 $!
[1] 8223
Process 1 -- Writing to map:   All your bases are belong to us.
Process 2 -- Map now contains: All your bases are belong to us.
Process 2 -- Writing to map:   All your bases *NOW* belong to us.
Process 1 -- Map now contains: All your bases *NOW* belong to us.
[1]+  Done                    ./p1

To me, the example above shows the used of a mmaped region shared 
between two processes.

C code above was written too fast again! Some headers are redundant, I 
do not check for all error conditions, I don't do all the cleanup - it's 
kind of quick and dirty. I hope you can forgive me.
(it works on mac and linux unmodified, though quick and dirty)

I am fascinated by Daniel's suggestion of accessing the guest memory via 
the QMU process existing map, found through /dev/pid/mmaps. It is my 
understanding that Daniel's suggestion would require ptrace()'ing the 
QEMU process, and potentially stopping the QEMU process to access the 
guest mmap'ed memory.
>
>
> This thread is in reply to Valerio's attempt to upstream this patch.
> Good move.
>
> The usual questions for feature requests apply:
>
> 1. Is this a use case we want to serve?
>
>     Unreserved yes.  Supporting virtual machine introspection with LibVMI
>     makes sense.
>
> [...]

Markus that's a great description of  needs and potential approaches. It 
requires a bit of thinking. I reserve the right to provide comments 
later on.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-26 17:37                         ` Valerio Aimale
@ 2015-10-26 17:52                           ` Eduardo Habkost
  2015-10-27 14:17                             ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-26 17:52 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: qemu-devel, Markus Armbruster, lcapitulino

On Mon, Oct 26, 2015 at 11:37:04AM -0600, Valerio Aimale wrote:
> On 10/26/15 3:09 AM, Markus Armbruster wrote:
> >[...]
> >
> >>Eduardo, I think it would be a common rule of politeness not to pass
> >>any judgement on a person that you don't know, but for some texts in a
> >>mailing list. I think I understand how mmap() works, and very well.
> >>
> >>Participating is this discussion has been a struggle for me. For the
> >>good of the libvmi users, I have been trying to ignore the judgements,
> >>the comments and so on. But, alas, I throw my hands up in the air, and
> >>I surrender.
> >I'm sorry we exceeded your tolerance for frustration.  This mailing list
> >can be tough.  We try to be welcoming (believe it or not), but we too
> >often fail (okay, that part is easily believable).
> >
> >To be honest, I had difficulties understanding your explanation, and
> >ended up guessing.  I figure Eduardo did the same, and guessed
> >incorrectly.  There but for the grace of God go I.
> Well, I did scribble my C sample excerpt too fast. Participating in mailing
> list is not part of my job description - I was short on time, I admit to
> that. However, there is a big difference in saying "I do no understand your
> explanation, please try again" and saying "you're confused about mmap()"
> 
> I was trying to advocate the use of a shared mmap'ed region. The sharing
> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
> process. I envision that there could be a QEMU command line argument, such
> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
> libvmi client should own the file name - I have not forgotten that. When
> that command line argument is given, as part of the guest initialization,
> QEMU creates a file of size equal to the size of the guest memory containing
> all zeros, mmaps that file to the guest memory with  PROT_READ|PROT_WRITE
> and MAP_FILE|MAP_SHARED, then starts the guest.

This is basically what memory-backend-file (and the legacy -mem-path
option) already does today, but it unlinks the file just after opening
it. We can change it to accept a full filename and/or an option to make
it not unlink the file after opening it.

I don't remember if memory-backend-file is usable without -numa, but we
could make it possible somehow.


> And, if at all possible,
> makes the filename querable via qmp and/or hmp, so that the filename of the
> mmap would not need to be maintained in two different places, leading to
> maintenance nightmares.

You can probably query the memory-backend-file QOM properties using QMP,
already. Then the filename could be exposed as a QOM property.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-26 17:52                           ` Eduardo Habkost
@ 2015-10-27 14:17                             ` Valerio Aimale
  2015-10-27 15:00                               ` Markus Armbruster
  0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 14:17 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: qemu-devel, Markus Armbruster, lcapitulino

On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>
>
> I was trying to advocate the use of a shared mmap'ed region. The sharing
> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
> process. I envision that there could be a QEMU command line argument, such
> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
> libvmi client should own the file name - I have not forgotten that. When
> that command line argument is given, as part of the guest initialization,
> QEMU creates a file of size equal to the size of the guest memory containing
> all zeros, mmaps that file to the guest memory with  PROT_READ|PROT_WRITE
> and MAP_FILE|MAP_SHARED, then starts the guest.
> This is basically what memory-backend-file (and the legacy -mem-path
> option) already does today, but it unlinks the file just after opening
> it. We can change it to accept a full filename and/or an option to make
> it not unlink the file after opening it.
>
> I don't remember if memory-backend-file is usable without -numa, but we
> could make it possible somehow.
Eduardo, I did try this approach. It takes 2 line changes in exec.c: 
comment the unlink out, and making sure MAP_SHARED is used when 
-mem-path and -mem-prealloc are given. It works beautifully, and libvmi 
accesses are fast. However, the VM is slowed down to a crawl, obviously, 
because each RAM access by the VM triggers a page fault on the mmapped 
file. I don't think having a crawling VM is desirable, so this approach 
goes out the door.

I think we're back at estimating the speed of other approaches as 
discussed previously:

- via UNIX socket as per existing patch
- via xp parsing the human readable xp output
- via an xp-like command the returns memory content baseXX-encoded into 
a json string
- via shared memory as per existing code and patch

Any other?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-27 14:17                             ` Valerio Aimale
@ 2015-10-27 15:00                               ` Markus Armbruster
  2015-10-27 15:18                                 ` Valerio Aimale
  0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-27 15:00 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: lcapitulino, Eduardo Habkost, qemu-devel

Valerio Aimale <valerio@aimale.com> writes:

> On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>>
>>
>> I was trying to advocate the use of a shared mmap'ed region. The sharing
>> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
>> process. I envision that there could be a QEMU command line argument, such
>> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
>> libvmi client should own the file name - I have not forgotten that. When
>> that command line argument is given, as part of the guest initialization,
>> QEMU creates a file of size equal to the size of the guest memory containing
>> all zeros, mmaps that file to the guest memory with  PROT_READ|PROT_WRITE
>> and MAP_FILE|MAP_SHARED, then starts the guest.
>> This is basically what memory-backend-file (and the legacy -mem-path
>> option) already does today, but it unlinks the file just after opening
>> it. We can change it to accept a full filename and/or an option to make
>> it not unlink the file after opening it.
>>
>> I don't remember if memory-backend-file is usable without -numa, but we
>> could make it possible somehow.
> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
> comment the unlink out, and making sure MAP_SHARED is used when
> -mem-path and -mem-prealloc are given. It works beautifully, and
> libvmi accesses are fast. However, the VM is slowed down to a crawl,
> obviously, because each RAM access by the VM triggers a page fault on
> the mmapped file. I don't think having a crawling VM is desirable, so
> this approach goes out the door.

Uh, I don't understand why "each RAM access by the VM triggers a page
fault".  Can you show us the patch you used?

> I think we're back at estimating the speed of other approaches as
> discussed previously:
>
> - via UNIX socket as per existing patch
> - via xp parsing the human readable xp output
> - via an xp-like command the returns memory content baseXX-encoded
> into a json string
> - via shared memory as per existing code and patch
>
> Any other?

Yes, the existing alternative LibVMI method via gdbserver should be
included in the comparison.

Naturally, any approach that does the actual work via QMP will be dog
slow as long as LibVMI launches virsh for each QMP command.  Fixable, as
Eric pointed out: use the libvirt API and link against libvirt.so
instead.  No idea how much work that'll be.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-27 15:00                               ` Markus Armbruster
@ 2015-10-27 15:18                                 ` Valerio Aimale
  2015-10-27 15:31                                   ` Valerio Aimale
  2015-10-27 16:11                                   ` Markus Armbruster
  0 siblings, 2 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 15:18 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: lcapitulino, Eduardo Habkost, qemu-devel

On 10/27/15 9:00 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>>>
>>> I was trying to advocate the use of a shared mmap'ed region. The sharing
>>> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
>>> process. I envision that there could be a QEMU command line argument, such
>>> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
>>> libvmi client should own the file name - I have not forgotten that. When
>>> that command line argument is given, as part of the guest initialization,
>>> QEMU creates a file of size equal to the size of the guest memory containing
>>> all zeros, mmaps that file to the guest memory with  PROT_READ|PROT_WRITE
>>> and MAP_FILE|MAP_SHARED, then starts the guest.
>>> This is basically what memory-backend-file (and the legacy -mem-path
>>> option) already does today, but it unlinks the file just after opening
>>> it. We can change it to accept a full filename and/or an option to make
>>> it not unlink the file after opening it.
>>>
>>> I don't remember if memory-backend-file is usable without -numa, but we
>>> could make it possible somehow.
>> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
>> comment the unlink out, and making sure MAP_SHARED is used when
>> -mem-path and -mem-prealloc are given. It works beautifully, and
>> libvmi accesses are fast. However, the VM is slowed down to a crawl,
>> obviously, because each RAM access by the VM triggers a page fault on
>> the mmapped file. I don't think having a crawling VM is desirable, so
>> this approach goes out the door.
> Uh, I don't understand why "each RAM access by the VM triggers a page
> fault".  Can you show us the patch you used?
Sorry, too brief of an explanation. Every time the guest flips a byte in 
physical RAM, I think that triggers a page write to the mmaped file. My 
understanding is that, with MAP_SHARED, each write to RAM triggers a 
file write, hence the slowness. These are the simple changes I made, to 
test it - as a proof of concept.

in exec.c of the qemu-2.4.0.1 change

---
     fd = mkstemp(filename);
     if (fd < 0) {
         error_setg_errno(errp, errno,
                          "unable to create backing store for hugepages");
         g_free(filename);
         goto error;
     }
     unlink(filename);
     g_free(filename);

     memory = (memory+hpagesize-1) & ~(hpagesize-1);

     /*
      * ftruncate is not supported by hugetlbfs in older
      * hosts, so don't bother bailing out on errors.
      * If anything goes wrong with it under other filesystems,
      * mmap will fail.
      */
     if (ftruncate(fd, memory)) {
         perror("ftruncate");
     }

     area = mmap(0, memory, PROT_READ | PROT_WRITE,
                 (block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE),
                 fd, 0);
---

to

---
     fd = mkstemp(filename);
     if (fd < 0) {
         error_setg_errno(errp, errno,
                          "unable to create backing store for hugepages");
         g_free(filename);
         goto error;
     }
     /* unlink(filename); */ /* Valerio's change to persist guest RAM 
mmaped file */
     g_free(filename);

     memory = (memory+hpagesize-1) & ~(hpagesize-1);

     /*
      * ftruncate is not supported by hugetlbfs in older
      * hosts, so don't bother bailing out on errors.
      * If anything goes wrong with it under other filesystems,
      * mmap will fail.
      */
     if (ftruncate(fd, memory)) {
         perror("ftruncate");
     }

     area = mmap(0, memory, PROT_READ | PROT_WRITE,
                 MAP_FILE | MAP_SHARED, /* Valerio's change to persist 
guest RAM mmaped file */
                 fd, 0);
---

then, recompile qemu.

Launch a VM as

/usr/local/bin/qemu-system-x86_64 -name Windows10 -S -machine 
pc-i440fx-2.4,accel=kvm,usb=off [...] -mem-prealloc -mem-path /tmp/maps

# I know -mem-path is deprecated, but I used for speeding up the proof 
of concept.

With the above command, I have a the following file

$ ls -l /tmp/maps/
-rw------- 1 libvirt-qemu kvm 2147483648 Oct 27 08:31 
qemu_back_mem.pc.ram.fP4sKH

which is a mmap of the Win VM physical RAM

$ hexdump -C /tmp/maps/qemu_back_mem.val.pc.ram.fP4sKH

00000000  53 ff 00 f0 53 ff 00 f0  c3 e2 00 f0 53 ff 00 f0 
|S...S.......S...|
[...]
00000760  24 02 c3 49 6e 76 61 6c  69 64 20 70 61 72 74 69 |$..Invalid 
parti|
00000770  74 69 6f 6e 20 74 61 62  6c 65 00 45 72 72 6f 72  |tion 
table.Error|
00000780  20 6c 6f 61 64 69 6e 67  20 6f 70 65 72 61 74 69  | loading 
operati|
00000790  6e 67 20 73 79 73 74 65  6d 00 4d 69 73 73 69 6e  |ng 
system.Missin|
000007a0  67 20 6f 70 65 72 61 74  69 6e 67 20 73 79 73 74  |g operating 
syst|
000007b0  65 6d 00 00 00 63 7b 9a  73 d8 99 ce 00 00 80 20 
|em...c{.s...... |
[...]

I did not try to mmap'ing to a file on a RAMdisk. Without physical disk 
I/O, the VM might run faster.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-27 15:18                                 ` Valerio Aimale
@ 2015-10-27 15:31                                   ` Valerio Aimale
  2015-10-27 16:11                                   ` Markus Armbruster
  1 sibling, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 15:31 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: lcapitulino, Eduardo Habkost, qemu-devel

On 10/27/15 9:18 AM, Valerio Aimale wrote:
>
>
> I did not try to mmap'ing to a file on a RAMdisk. Without physical 
> disk I/O, the VM might run faster.
>
>

I did try with the file on a ramdisk

$ sudo mount -o size=3G -t tmpfs none /ramdisk
$ /usr/local/bin/qemu-system-x86_64 -name Windows10 -S -machine 
pc-i440fx-2.4,accel=kvm,usb=off [...] -mem-prealloc -mem-path /ramdisk

with that, the speed of the VM is acceptable. It will not take your 
breath away, but it is reasonable.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-27 15:18                                 ` Valerio Aimale
  2015-10-27 15:31                                   ` Valerio Aimale
@ 2015-10-27 16:11                                   ` Markus Armbruster
  2015-10-27 16:27                                     ` Valerio Aimale
  1 sibling, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-27 16:11 UTC (permalink / raw)
  To: Valerio Aimale; +Cc: qemu-devel, Eduardo Habkost, lcapitulino

Valerio Aimale <valerio@aimale.com> writes:

> On 10/27/15 9:00 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>>>>
>>>> I was trying to advocate the use of a shared mmap'ed region. The sharing
>>>> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
>>>> process. I envision that there could be a QEMU command line argument, such
>>>> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
>>>> libvmi client should own the file name - I have not forgotten that. When
>>>> that command line argument is given, as part of the guest initialization,
>>>> QEMU creates a file of size equal to the size of the guest memory containing
>>>> all zeros, mmaps that file to the guest memory with  PROT_READ|PROT_WRITE
>>>> and MAP_FILE|MAP_SHARED, then starts the guest.
>>>> This is basically what memory-backend-file (and the legacy -mem-path
>>>> option) already does today, but it unlinks the file just after opening
>>>> it. We can change it to accept a full filename and/or an option to make
>>>> it not unlink the file after opening it.
>>>>
>>>> I don't remember if memory-backend-file is usable without -numa, but we
>>>> could make it possible somehow.
>>> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
>>> comment the unlink out, and making sure MAP_SHARED is used when
>>> -mem-path and -mem-prealloc are given. It works beautifully, and
>>> libvmi accesses are fast. However, the VM is slowed down to a crawl,
>>> obviously, because each RAM access by the VM triggers a page fault on
>>> the mmapped file. I don't think having a crawling VM is desirable, so
>>> this approach goes out the door.
>> Uh, I don't understand why "each RAM access by the VM triggers a page
>> fault".  Can you show us the patch you used?
> Sorry, too brief of an explanation. Every time the guest flips a byte in 
> physical RAM, I think that triggers a page write to the mmaped file. My 
> understanding is that, with MAP_SHARED, each write to RAM triggers a 
> file write, hence the slowness. These are the simple changes I made, to 
> test it - as a proof of concept.

Ah, that actually makes sense.  Thanks!

[...]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-27 16:11                                   ` Markus Armbruster
@ 2015-10-27 16:27                                     ` Valerio Aimale
  0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 16:27 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Eduardo Habkost, lcapitulino

On 10/27/15 10:11 AM, Markus Armbruster wrote:
> [...]
>>>> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
>>>> comment the unlink out, and making sure MAP_SHARED is used when
>>>> -mem-path and -mem-prealloc are given. It works beautifully, and
>>>> libvmi accesses are fast. However, the VM is slowed down to a crawl,
>>>> obviously, because each RAM access by the VM triggers a page fault on
>>>> the mmapped file. I don't think having a crawling VM is desirable, so
>>>> this approach goes out the door.
>>> Uh, I don't understand why "each RAM access by the VM triggers a page
>>> fault".  Can you show us the patch you used?
>> Sorry, too brief of an explanation. Every time the guest flips a byte in
>> physical RAM, I think that triggers a page write to the mmaped file. My
>> understanding is that, with MAP_SHARED, each write to RAM triggers a
>> file write, hence the slowness. These are the simple changes I made, to
>> test it - as a proof of concept.
> Ah, that actually makes sense.  Thanks!
>
> [...]
However, when the guest RAM mmap'ed file resides on a RAMdisk on the 
host, the guest OS responsiveness is more than acceptable. Perhaps this 
is a viable approach. It might requires a minimum of changes to the QEMU 
source and maybe 1 extra command line argument.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
  2015-10-23 19:24               ` Eduardo Habkost
  2015-10-23 20:02                 ` Richard Henderson
@ 2015-11-02 12:55                 ` Paolo Bonzini
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2015-11-02 12:55 UTC (permalink / raw)
  To: Eduardo Habkost, Markus Armbruster
  Cc: Richard Henderson, qemu-devel, Valerio Aimale, lcapitulino



On 23/10/2015 21:24, Eduardo Habkost wrote:
>> > 
>> > Valerio, would an command line option to share guest memory suffice, or
>> > does it have to be a monitor command?  If the latter, why?
>> > 
>> > Eduardo, I'm not sure writing to guest memory behind TCG's back will
>> > work.  Do you know?
> I don't know. I guess it may possibly surprise TCG depending on how some
> operations are implemented, but it sounds unlikely. CCing Richard.
> 
>> > 
>> > Will it work with KVM?
> I always assumed it would work, but CCing Paolo to see if there are any
> pitfalls.

Yes, both should work.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2015-11-02 12:55 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
2015-10-19 21:33   ` Eric Blake
2015-10-21 15:11     ` Valerio Aimale
2015-10-16  8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
2015-10-16 14:30   ` Valerio Aimale
2015-10-19  7:52     ` Markus Armbruster
2015-10-19 14:37       ` Valerio Aimale
2015-10-21 10:54         ` Markus Armbruster
2015-10-21 15:50           ` Valerio Aimale
2015-10-22 11:50             ` Markus Armbruster
2015-10-22 18:11               ` Valerio Aimale
2015-10-23  6:31                 ` Markus Armbruster
2015-10-22 18:43           ` Valerio Aimale
2015-10-22 18:54             ` Eric Blake
2015-10-22 19:12           ` Eduardo Habkost
2015-10-22 19:57             ` Valerio Aimale
2015-10-22 20:03               ` Eric Blake
2015-10-22 20:45                 ` Valerio Aimale
2015-10-22 21:47               ` Eduardo Habkost
2015-10-22 21:51                 ` Valerio Aimale
2015-10-23  8:25                   ` Daniel P. Berrange
2015-10-23 19:00                     ` Eduardo Habkost
2015-10-23 18:55                   ` Eduardo Habkost
2015-10-23 19:08                     ` Valerio Aimale
2015-10-26  9:09                       ` Markus Armbruster
2015-10-26 17:37                         ` Valerio Aimale
2015-10-26 17:52                           ` Eduardo Habkost
2015-10-27 14:17                             ` Valerio Aimale
2015-10-27 15:00                               ` Markus Armbruster
2015-10-27 15:18                                 ` Valerio Aimale
2015-10-27 15:31                                   ` Valerio Aimale
2015-10-27 16:11                                   ` Markus Armbruster
2015-10-27 16:27                                     ` Valerio Aimale
2015-10-23  6:35             ` Markus Armbruster
2015-10-23  8:18               ` Daniel P. Berrange
2015-10-23 14:48                 ` Valerio Aimale
2015-10-23 14:44               ` Valerio Aimale
2015-10-23 14:56                 ` Eric Blake
2015-10-23 15:03                   ` Valerio Aimale
2015-10-23 19:24               ` Eduardo Habkost
2015-10-23 20:02                 ` Richard Henderson
2015-11-02 12:55                 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).