* [Qemu-devel] QEMU patch to allow VM introspection via libvmi
@ 2015-10-15 23:44 valerio
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
2015-10-16 8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
0 siblings, 2 replies; 43+ messages in thread
From: valerio @ 2015-10-15 23:44 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru, ehabkost, lcapitulino
All-
I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs.
Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread.
The client writes binary commands to the socket, in the form of a c structure:
struct request {
uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
uint64_t address; // address to read from OR write to
uint64_t length; // number of bytes to read OR write
};
The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation.
The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure).
The socket API was written by the libvmi author and it works the with current libvmi version. The libvmi client-side implementation is at:
https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
As many use kvm VM's for introspection, malware and security analysis, it might be worth thinking about making the pmemaccess a permanent hmp/qmp command, as opposed to having to produce a patch at each QEMU point release.
Also, the pmemsave commands QAPI should be changed to be usable with 64bit VM's
in qapi-schema.json
from
---
{ 'command': 'pmemsave',
'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
---
to
---
{ 'command': 'pmemsave',
'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
---
hmp-commands.hx and qmp-commands.hx should be edited accordingly. I did not make the above pmemsave changes part of my patch.
Let me know if you have any questions,
Valerio
^ permalink raw reply [flat|nested] 43+ messages in thread
* [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently.
2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
@ 2015-10-15 23:44 ` valerio
2015-10-19 21:33 ` Eric Blake
2015-10-16 8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
1 sibling, 1 reply; 43+ messages in thread
From: valerio @ 2015-10-15 23:44 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru, ehabkost, Valerio Aimale, lcapitulino
From: Valerio Aimale <valerio@aimale.com>
---
Makefile.target | 2 +-
hmp-commands.hx | 14 ++++
hmp.c | 9 +++
hmp.h | 1 +
memory-access.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
memory-access.h | 21 ++++++
qapi-schema.json | 28 ++++++++
qmp-commands.hx | 23 +++++++
8 files changed, 303 insertions(+), 1 deletion(-)
create mode 100644 memory-access.c
create mode 100644 memory-access.h
diff --git a/Makefile.target b/Makefile.target
index 962d004..940ab51 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER
#########################################################
# System emulator target
ifdef CONFIG_SOFTMMU
-obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
+obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o
obj-y += qtest.o bootdevice.o
obj-y += hw/
obj-$(CONFIG_KVM) += kvm-all.o
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3a4ae39..37d3bb5 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}.
ETEXI
{
+ .name = "pmemaccess",
+ .args_type = "path:s",
+ .params = "file",
+ .help = "open A UNIX Socket access to physical memory at 'path'",
+ .mhandler.cmd = hmp_pmemaccess,
+ },
+
+STEXI
+@item pmemaccess @var{file}
+@findex pmemaccess
+Mount guest physical memory image at 'path'
+ETEXI
+
+ {
.name = "boot_set",
.args_type = "bootdevice:s",
.params = "bootdevice",
diff --git a/hmp.c b/hmp.c
index 5048eee..444dbf6 100644
--- a/hmp.c
+++ b/hmp.c
@@ -916,6 +916,15 @@ void hmp_pmemsave(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, &err);
}
+void hmp_pmemaccess(Monitor *mon, const QDict *qdict)
+{
+ const char *path = qdict_get_str(qdict, "path");
+ Error *err = NULL;
+
+ qmp_pmemaccess(path, &err);
+ hmp_handle_error(mon, &err);
+}
+
void hmp_ringbuf_write(Monitor *mon, const QDict *qdict)
{
const char *chardev = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 81656c3..8225deb 100644
--- a/hmp.h
+++ b/hmp.h
@@ -47,6 +47,7 @@ void hmp_system_powerdown(Monitor *mon, const QDict *qdict);
void hmp_cpu(Monitor *mon, const QDict *qdict);
void hmp_memsave(Monitor *mon, const QDict *qdict);
void hmp_pmemsave(Monitor *mon, const QDict *qdict);
+void hmp_pmemaccess(Monitor *mon, const QDict *qdict);
void hmp_ringbuf_write(Monitor *mon, const QDict *qdict);
void hmp_ringbuf_read(Monitor *mon, const QDict *qdict);
void hmp_cont(Monitor *mon, const QDict *qdict);
diff --git a/memory-access.c b/memory-access.c
new file mode 100644
index 0000000..58099a8
--- /dev/null
+++ b/memory-access.c
@@ -0,0 +1,206 @@
+/*
+ * Access guest physical memory via a domain socket.
+ *
+ * Copyright (C) 2011 Sandia National Laboratories
+ * Original Author: Bryan D. Payne (bdpayne@acm.org)
+ *
+ * Refurbished for modern QEMU by Valerio Aimale (valerio@aimale.com), in 2015
+ */
+
+#include "memory-access.h"
+//#include "cpu-all.h"
+#include "qemu-common.h"
+#include "exec/cpu-common.h"
+#include "config.h"
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <unistd.h>
+#include <signal.h>
+#include <stdint.h>
+
+struct request{
+ uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
+ uint64_t address; // address to read from OR write to
+ uint64_t length; // number of bytes to read OR write
+};
+
+static uint64_t
+connection_read_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
+{
+ hwaddr paddr = (hwaddr) user_paddr;
+ hwaddr len = (hwaddr) user_len;
+ void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
+ if (!guestmem){
+ return 0;
+ }
+ memcpy(buf, guestmem, len);
+ cpu_physical_memory_unmap(guestmem, len, 0, len);
+
+ return len;
+}
+
+static uint64_t
+connection_write_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
+{
+ hwaddr paddr = (hwaddr) user_paddr;
+ hwaddr len = (hwaddr) user_len;
+ void *guestmem = cpu_physical_memory_map(paddr, &len, 1);
+ if (!guestmem){
+ return 0;
+ }
+ memcpy(guestmem, buf, len);
+ cpu_physical_memory_unmap(guestmem, len, 0, len);
+
+ return len;
+}
+
+static void
+send_success_ack (int connection_fd)
+{
+ uint8_t success = 1;
+ int nbytes = write(connection_fd, &success, 1);
+ if (1 != nbytes){
+ fprintf(stderr, "Qemu pmemaccess: failed to send success ack\n");
+ }
+}
+
+static void
+send_fail_ack (int connection_fd)
+{
+ uint8_t fail = 0;
+ int nbytes = write(connection_fd, &fail, 1);
+ if (1 != nbytes){
+ fprintf(stderr, "Qemu pmemaccess: failed to send fail ack\n");
+ }
+}
+
+static void
+connection_handler (int connection_fd)
+{
+ int nbytes;
+ struct request req;
+
+ while (1){
+ // client request should match the struct request format
+ nbytes = read(connection_fd, &req, sizeof(struct request));
+ if (nbytes != sizeof(struct request)){
+ // error
+ continue;
+ }
+ else if (req.type == 0){
+ // request to quit, goodbye
+ break;
+ }
+ else if (req.type == 1){
+ // request to read
+ char *buf = malloc(req.length + 1);
+ nbytes = connection_read_memory(req.address, buf, req.length);
+ if (nbytes != req.length){
+ // read failure, return failure message
+ buf[req.length] = 0; // set last byte to 0 for failure
+ nbytes = write(connection_fd, buf, 1);
+ }
+ else{
+ // read success, return bytes
+ buf[req.length] = 1; // set last byte to 1 for success
+ nbytes = write(connection_fd, buf, nbytes + 1);
+ }
+ free(buf);
+ }
+ else if (req.type == 2){
+ // request to write
+ void *write_buf = malloc(req.length);
+ nbytes = read(connection_fd, &write_buf, req.length);
+ if (nbytes != req.length){
+ // failed reading the message to write
+ send_fail_ack(connection_fd);
+ }
+ else{
+ // do the write
+ nbytes = connection_write_memory(req.address, write_buf, req.length);
+ if (nbytes == req.length){
+ send_success_ack(connection_fd);
+ }
+ else{
+ send_fail_ack(connection_fd);
+ }
+ }
+ free(write_buf);
+ }
+ else{
+ // unknown command
+ fprintf(stderr, "Qemu pmemaccess: ignoring unknown command (%d)\n", req.type);
+ char *buf = malloc(1);
+ buf[0] = 0;
+ nbytes = write(connection_fd, buf, 1);
+ free(buf);
+ }
+ }
+
+ close(connection_fd);
+}
+
+static void *
+memory_access_thread (void *p)
+{
+ struct sockaddr_un address;
+ int socket_fd, connection_fd;
+ socklen_t address_length;
+ struct pmemaccess_args *pargs = (struct pmemaccess_args *)p;
+
+ socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
+ if (socket_fd < 0){
+ error_setg(pargs->errp, "Qemu pmemaccess: socket failed");
+ goto error_exit;
+ }
+ unlink(pargs->path);
+ address.sun_family = AF_UNIX;
+ address_length = sizeof(address.sun_family) + sprintf(address.sun_path, "%s", (char *) pargs->path);
+
+ if (bind(socket_fd, (struct sockaddr *) &address, address_length) != 0){
+ error_setg(pargs->errp, "Qemu pmemaccess: bind failed");
+ goto error_exit;
+ }
+ if (listen(socket_fd, 0) != 0){
+ error_setg(pargs->errp, "Qemu pmemaccess: listen failed");
+ goto error_exit;
+ }
+
+ connection_fd = accept(socket_fd, (struct sockaddr *) &address, &address_length);
+ connection_handler(connection_fd);
+
+ close(socket_fd);
+ unlink(pargs->path);
+ free(pargs->path);
+ free(pargs);
+error_exit:
+ return NULL;
+}
+
+void
+qmp_pmemaccess (const char *path, Error **errp)
+{
+ pthread_t thread;
+ sigset_t set, oldset;
+ struct pmemaccess_args *pargs;
+
+ // create the args struct
+ pargs = (struct pmemaccess_args *) malloc(sizeof(struct pmemaccess_args));
+ pargs->errp = errp;
+ // create a copy of path that we can safely use
+ pargs->path = malloc(strlen(path) + 1);
+ memcpy(pargs->path, path, strlen(path) + 1);
+
+ // start the thread
+ sigfillset(&set);
+ pthread_sigmask(SIG_SETMASK, &set, &oldset);
+ pthread_create(&thread, NULL, memory_access_thread, pargs);
+ pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+
+}
diff --git a/memory-access.h b/memory-access.h
new file mode 100644
index 0000000..ff3d2f9
--- /dev/null
+++ b/memory-access.h
@@ -0,0 +1,21 @@
+/*
+ * Mount guest physical memory using FUSE.
+ *
+ * Author: Valerio G. Aimale <valerio@aimale.com>
+ */
+
+#ifndef MEMORY_ACCESS_H
+#define MEMORY_ACCESS_H
+
+#include "qapi-types.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/error.h"
+
+void qmp_pmemaccess (const char *path, Error **errp);
+
+struct pmemaccess_args {
+ char *path;
+ Error **errp;
+};
+
+#endif /* MEMORY_ACCESS_H */
diff --git a/qapi-schema.json b/qapi-schema.json
index a386605..24d4070 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1427,6 +1427,34 @@
'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
##
+# @pmemaccess:
+#
+# Open A UNIX Socket access to physical memory
+#
+# @path: the name of the UNIX socket pipe
+#
+# Returns: Nothing on success
+#
+# Since: 2.4.0.1
+#
+# Notes: Derived from previously existing patches. When command
+# succeeds connect to the open socket. Write a binary structure to
+# the socket as:
+#
+# struct request {
+# uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
+# uint64_t address; // address to read from OR write to
+# uint64_t length; // number of bytes to read OR write
+# };
+#
+# If it is a read operation, Qemu will return lenght+1 bytes. Read lenght+1
+# bytes. the last byte will be a 1 for success, or a 0 for failure.
+#
+##
+{ 'command': 'pmemaccess',
+ 'data': {'path': 'str'} }
+
+##
# @cont:
#
# Resume guest VCPU execution.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index d2ba800..26e4a51 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -472,6 +472,29 @@ Example:
EQMP
{
+ .name = "pmemaccess",
+ .args_type = "path:s",
+ .mhandler.cmd_new = qmp_marshal_input_pmemaccess,
+ },
+
+SQMP
+pmemaccess
+----------
+
+Open A UNIX Socket access to physical memory
+
+Arguments:
+
+- "path": mount point path (json-string)
+
+Example:
+
+-> { "execute": "pmemaccess",
+ "arguments": { "path": "/tmp/guestname" } }
+<- { "return": {} }
+
+EQMP
+ {
.name = "inject-nmi",
.args_type = "",
.mhandler.cmd_new = qmp_marshal_inject_nmi,
--
2.1.4
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
@ 2015-10-16 8:15 ` Markus Armbruster
2015-10-16 14:30 ` Valerio Aimale
1 sibling, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-16 8:15 UTC (permalink / raw)
To: valerio; +Cc: lcapitulino, qemu-devel, ehabkost
valerio@aimale.com writes:
> All-
>
> I've produced a patch for the current QEMU HEAD, for libvmi to
> introspect QEMU/KVM VMs.
>
> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>
> This patch adds a hmp and a qmp command, "pmemaccess". When the
> commands is invoked with a string arguments (a filename), it will open
> a UNIX socket and spawn a listening thread.
>
> The client writes binary commands to the socket, in the form of a c
> structure:
>
> struct request {
> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
> uint64_t address; // address to read from OR write to
> uint64_t length; // number of bytes to read OR write
> };
>
> The client receives as a response, either (length+1) bytes, if it is a
> read operation, or 1 byte ifit is a write operation.
>
> The last bytes of a read operation response indicates success (1
> success, 0 failure). The single byte returned for a write operation
> indicates same (1 success, 0 failure).
So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
garbage followed by the "it failed" byte?
> The socket API was written by the libvmi author and it works the with
> current libvmi version. The libvmi client-side implementation is at:
>
> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>
> As many use kvm VM's for introspection, malware and security analysis,
> it might be worth thinking about making the pmemaccess a permanent
> hmp/qmp command, as opposed to having to produce a patch at each QEMU
> point release.
Related existing commands: memsave, pmemsave, dump-guest-memory.
Can you explain why these won't do for your use case?
> Also, the pmemsave commands QAPI should be changed to be usable with
> 64bit VM's
>
> in qapi-schema.json
>
> from
>
> ---
> { 'command': 'pmemsave',
> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
> ---
>
> to
>
> ---
> { 'command': 'pmemsave',
> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
> ---
In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
confusing.
> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
> did not make the above pmemsave changes part of my patch.
>
> Let me know if you have any questions,
>
> Valerio
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-16 8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
@ 2015-10-16 14:30 ` Valerio Aimale
2015-10-19 7:52 ` Markus Armbruster
0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-16 14:30 UTC (permalink / raw)
To: Markus Armbruster; +Cc: lcapitulino, qemu-devel, ehabkost
On 10/16/15 2:15 AM, Markus Armbruster wrote:
> valerio@aimale.com writes:
>
>> All-
>>
>> I've produced a patch for the current QEMU HEAD, for libvmi to
>> introspect QEMU/KVM VMs.
>>
>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>> commands is invoked with a string arguments (a filename), it will open
>> a UNIX socket and spawn a listening thread.
>>
>> The client writes binary commands to the socket, in the form of a c
>> structure:
>>
>> struct request {
>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>> uint64_t address; // address to read from OR write to
>> uint64_t length; // number of bytes to read OR write
>> };
>>
>> The client receives as a response, either (length+1) bytes, if it is a
>> read operation, or 1 byte ifit is a write operation.
>>
>> The last bytes of a read operation response indicates success (1
>> success, 0 failure). The single byte returned for a write operation
>> indicates same (1 success, 0 failure).
> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
> garbage followed by the "it failed" byte?
Markus, that appear to be the case. However, I did not write the
communication protocol between libvmi and qemu. I'm assuming that the
person that wrote the protocol, did not want to bother with over
complicating things.
https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
I'm thinking he assumed reads would be small in size and the price of
reading garbage was less than the price of writing a more complicated
protocol. I can see his point, confronted with the same problem, I might
have done the same.
>> The socket API was written by the libvmi author and it works the with
>> current libvmi version. The libvmi client-side implementation is at:
>>
>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>
>> As many use kvm VM's for introspection, malware and security analysis,
>> it might be worth thinking about making the pmemaccess a permanent
>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>> point release.
> Related existing commands: memsave, pmemsave, dump-guest-memory.
>
> Can you explain why these won't do for your use case?
For people who do security analysis there are two use cases, static and
dynamic analysis. With memsave, pmemsave and dum-guest-memory one can do
static analysis. I.e. snapshotting a VM and see what was happening at
that point in time.
Dynamic analysis require to be able to 'introspect' a VM while it's running.
If you take a snapshot of two people exchanging a glass of water, and
you happen to take it at the very moment both persons have their hands
on the glass, it's hard to tell who passed the glass to whom. If you
have a movie of the same scene, it's obvious who's the giver and who's
the receiver. Same use case.
More to the point, there's a host of C and python frameworks to
dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
build on top of libvmi. I did not want to reinvent the wheel.
Mind you, 99.9% of people that do dynamic VM analysis use xen. They
contend that xen has better introspection support. In my case, I did not
want to bother with dedicating a full server to be a xen domain 0. I
just wanted to do a quick test by standing up a QEMU/kvm VM, in an
otherwise purposed server.
>
>> Also, the pmemsave commands QAPI should be changed to be usable with
>> 64bit VM's
>>
>> in qapi-schema.json
>>
>> from
>>
>> ---
>> { 'command': 'pmemsave',
>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>> ---
>>
>> to
>>
>> ---
>> { 'command': 'pmemsave',
>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>> ---
> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
> confusing.
I think it's confusing for the HMP parser too. If you have a VM with 8Gb
of RAM and want to snapshot the whole physical memory, via HMP over
telnet this is what happens:
$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) help pmemsave
pmemsave addr size file -- save to disk physical memory dump starting at
'addr' of size 'size'
(qemu) pmemsave 0 8589934591 "/tmp/memorydump"
'pmemsave' has failed: integer is for 32-bit values
Try "help pmemsave" for more information
(qemu) quit
With the changes I suggested, the command succeeds
$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) help pmemsave
pmemsave addr size file -- save to disk physical memory dump starting at
'addr' of size 'size'
(qemu) pmemsave 0 8589934591 "/tmp/memorydump"
(qemu) quit
However I just noticed that the dump is just about 4GB in size, so there
might be more changes needed to snapshot all physical memory of a 64 but
VM. I did not investigate any further.
ls -l /tmp/memorydump
-rw-rw-r-- 1 libvirt-qemu kvm 4294967295 Oct 16 08:04 /tmp/memorydump
>> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
>> did not make the above pmemsave changes part of my patch.
>>
>> Let me know if you have any questions,
>>
>> Valerio
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-16 14:30 ` Valerio Aimale
@ 2015-10-19 7:52 ` Markus Armbruster
2015-10-19 14:37 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-19 7:52 UTC (permalink / raw)
To: Valerio Aimale; +Cc: qemu-devel, ehabkost, lcapitulino
Valerio Aimale <valerio@aimale.com> writes:
> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>> valerio@aimale.com writes:
>>
>>> All-
>>>
>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>> introspect QEMU/KVM VMs.
>>>
>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>
>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>> commands is invoked with a string arguments (a filename), it will open
>>> a UNIX socket and spawn a listening thread.
>>>
>>> The client writes binary commands to the socket, in the form of a c
>>> structure:
>>>
>>> struct request {
>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>> uint64_t address; // address to read from OR write to
>>> uint64_t length; // number of bytes to read OR write
>>> };
>>>
>>> The client receives as a response, either (length+1) bytes, if it is a
>>> read operation, or 1 byte ifit is a write operation.
>>>
>>> The last bytes of a read operation response indicates success (1
>>> success, 0 failure). The single byte returned for a write operation
>>> indicates same (1 success, 0 failure).
>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>> garbage followed by the "it failed" byte?
> Markus, that appear to be the case. However, I did not write the
> communication protocol between libvmi and qemu. I'm assuming that the
> person that wrote the protocol, did not want to bother with over
> complicating things.
>
> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>
> I'm thinking he assumed reads would be small in size and the price of
> reading garbage was less than the price of writing a more complicated
> protocol. I can see his point, confronted with the same problem, I
> might have done the same.
All right, the interface is designed for *small* memory blocks then.
Makes me wonder why he needs a separate binary protocol on a separate
socket. Small blocks could be done just fine in QMP.
>>> The socket API was written by the libvmi author and it works the with
>>> current libvmi version. The libvmi client-side implementation is at:
>>>
>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>
>>> As many use kvm VM's for introspection, malware and security analysis,
>>> it might be worth thinking about making the pmemaccess a permanent
>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>> point release.
>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>
>> Can you explain why these won't do for your use case?
> For people who do security analysis there are two use cases, static
> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
> can do static analysis. I.e. snapshotting a VM and see what was
> happening at that point in time.
> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>
> If you take a snapshot of two people exchanging a glass of water, and
> you happen to take it at the very moment both persons have their hands
> on the glass, it's hard to tell who passed the glass to whom. If you
> have a movie of the same scene, it's obvious who's the giver and who's
> the receiver. Same use case.
I understand the need for introspecting a running guest. What exactly
makes the existing commands unsuitable for that?
> More to the point, there's a host of C and python frameworks to
> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
> build on top of libvmi. I did not want to reinvent the wheel.
Fair enough.
Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
memory files." What exactly is missing for KVM?
> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
> contend that xen has better introspection support. In my case, I did
> not want to bother with dedicating a full server to be a xen domain
> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
> an otherwise purposed server.
I'm not at all against better introspection support in QEMU. I'm just
trying to understand the problem you're trying to solve with your
patches.
>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>> 64bit VM's
>>>
>>> in qapi-schema.json
>>>
>>> from
>>>
>>> ---
>>> { 'command': 'pmemsave',
>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>> ---
>>>
>>> to
>>>
>>> ---
>>> { 'command': 'pmemsave',
>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>> ---
>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>> confusing.
> I think it's confusing for the HMP parser too. If you have a VM with
> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
> over telnet this is what happens:
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) help pmemsave
> pmemsave addr size file -- save to disk physical memory dump starting
> at 'addr' of size 'size'
> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
> 'pmemsave' has failed: integer is for 32-bit values
> Try "help pmemsave" for more information
> (qemu) quit
Your change to pmemsave's definition in qapi-schema.json is effectively a
no-op.
Your example shows *HMP* command pmemsave. The definition of an HMP
command is *independent* of the QMP command. The implementation *uses*
the QMP command.
QMP pmemsave is defined in qapi-schema.json as
{ 'command': 'pmemsave',
'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
Its implementation is in cpus.c:
void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
Error **errp)
Note the int64_t size.
HMP pmemsave is defined in hmp-commands.hx as
{
.name = "pmemsave",
.args_type = "val:l,size:i,filename:s",
.params = "addr size file",
.help = "save to disk physical memory dump starting at 'addr' of size 'size'",
.mhandler.cmd = hmp_pmemsave,
},
Its implementation is in hmp.c:
void hmp_pmemsave(Monitor *mon, const QDict *qdict)
{
uint32_t size = qdict_get_int(qdict, "size");
const char *filename = qdict_get_str(qdict, "filename");
uint64_t addr = qdict_get_int(qdict, "val");
Error *err = NULL;
qmp_pmemsave(addr, size, filename, &err);
hmp_handle_error(mon, &err);
}
Note uint32_t size.
Arguably, the QMP size argument should use 'size' (an alias for
'uint64'), and the HMP args_type should use 'size:o'.
> With the changes I suggested, the command succeeds
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) help pmemsave
> pmemsave addr size file -- save to disk physical memory dump starting
> at 'addr' of size 'size'
> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
> (qemu) quit
>
> However I just noticed that the dump is just about 4GB in size, so
> there might be more changes needed to snapshot all physical memory of
> a 64 but VM. I did not investigate any further.
>
> ls -l /tmp/memorydump
> -rw-rw-r-- 1 libvirt-qemu kvm 4294967295 Oct 16 08:04 /tmp/memorydump
>
>>> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
>>> did not make the above pmemsave changes part of my patch.
>>>
>>> Let me know if you have any questions,
>>>
>>> Valerio
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-19 7:52 ` Markus Armbruster
@ 2015-10-19 14:37 ` Valerio Aimale
2015-10-21 10:54 ` Markus Armbruster
0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-19 14:37 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, ehabkost, lcapitulino
On 10/19/15 1:52 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>> valerio@aimale.com writes:
>>>
>>>> All-
>>>>
>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>> introspect QEMU/KVM VMs.
>>>>
>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>
>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>> commands is invoked with a string arguments (a filename), it will open
>>>> a UNIX socket and spawn a listening thread.
>>>>
>>>> The client writes binary commands to the socket, in the form of a c
>>>> structure:
>>>>
>>>> struct request {
>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>> uint64_t address; // address to read from OR write to
>>>> uint64_t length; // number of bytes to read OR write
>>>> };
>>>>
>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>> read operation, or 1 byte ifit is a write operation.
>>>>
>>>> The last bytes of a read operation response indicates success (1
>>>> success, 0 failure). The single byte returned for a write operation
>>>> indicates same (1 success, 0 failure).
>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>> garbage followed by the "it failed" byte?
>> Markus, that appear to be the case. However, I did not write the
>> communication protocol between libvmi and qemu. I'm assuming that the
>> person that wrote the protocol, did not want to bother with over
>> complicating things.
>>
>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>
>> I'm thinking he assumed reads would be small in size and the price of
>> reading garbage was less than the price of writing a more complicated
>> protocol. I can see his point, confronted with the same problem, I
>> might have done the same.
> All right, the interface is designed for *small* memory blocks then.
>
> Makes me wonder why he needs a separate binary protocol on a separate
> socket. Small blocks could be done just fine in QMP.
The problem is speed. if one's analyzing the memory space of a running
process (physical and paged), libvmi will make a large number of small
and mid-sized reads. If one uses xp, or pmemsave, the overhead is quite
significant. xp has overhead due to encoding, and pmemsave has overhead
due to file open/write (server), file open/read/close/unlink (client).
Others have gone through the problem before me. It appears that pmemsave
and xp are significantly slower than reading memory using a socket via
pmemaccess.
The following data is not mine, but it shows the time, in milliseconds,
required to resolve the content of a paged memory address via socket
(pmemaccess) , pmemsave and xp
http://cl.ly/image/322a3s0h1V05
Again, I did not produce those data points, they come from an old libvmi
thread.
I think it might be conceivable that there could be a QMP command that
returns the content of an arbitrarily size memory region as a base64 or
a base85 json string. It would still have both time- (due to
encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
overhead, + json encoding/decoding overhead. It might still be the case
that socket would outperform such a command as well, speed-vise. I don't
think it would be any faster than xp.
There's also a similar patch, floating around the internet, the uses
shared memory, instead of sockets, as inter-process communication
between libvmi and QEMU. I've never used that.
>
>>>> The socket API was written by the libvmi author and it works the with
>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>> it might be worth thinking about making the pmemaccess a permanent
>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>> point release.
>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>
>>> Can you explain why these won't do for your use case?
>> For people who do security analysis there are two use cases, static
>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>> can do static analysis. I.e. snapshotting a VM and see what was
>> happening at that point in time.
>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>
>> If you take a snapshot of two people exchanging a glass of water, and
>> you happen to take it at the very moment both persons have their hands
>> on the glass, it's hard to tell who passed the glass to whom. If you
>> have a movie of the same scene, it's obvious who's the giver and who's
>> the receiver. Same use case.
> I understand the need for introspecting a running guest. What exactly
> makes the existing commands unsuitable for that?
Speed. See discussion above.
>
>> More to the point, there's a host of C and python frameworks to
>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>> build on top of libvmi. I did not want to reinvent the wheel.
> Fair enough.
>
> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
> memory files." What exactly is missing for KVM?
When they say they support kvm, what they really mean they support the
(retired, I understand) qemu-kvm fork via a patch that is provided in
the libvmi source tree. I think the most recent qem-kvm supported is 1.6.0
https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
I wanted to bring support to the head revision of QEMU, to bring libvmi
level with more modern QEMU.
Maybe the solution is simply to put this patch in the libvmi source
tree, which I've already asked to do via pull request, leaving QEMU alone.
However, the patch has to be updated at every QEMU point release. I
wanted to avoid that, if at all possible.
>
>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>> contend that xen has better introspection support. In my case, I did
>> not want to bother with dedicating a full server to be a xen domain
>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>> an otherwise purposed server.
> I'm not at all against better introspection support in QEMU. I'm just
> trying to understand the problem you're trying to solve with your
> patches.
What all users of libvmi would love to have is super high speed access
to VM physical memory as part of the QEMU source tree, and not supported
via a patch. Implemented as the QEMU owners see fit, as long as it is
blazing fast and easy accessed via client library or inter-process
communication.
My gut feeling is that it has to bypass QMP protocol/encoding/file
access/json to be fast, but, it is just a gut feeling - worth nothing.
>
>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>> 64bit VM's
>>>>
>>>> in qapi-schema.json
>>>>
>>>> from
>>>>
>>>> ---
>>>> { 'command': 'pmemsave',
>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>> ---
>>>>
>>>> to
>>>>
>>>> ---
>>>> { 'command': 'pmemsave',
>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>> ---
>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>> confusing.
>> I think it's confusing for the HMP parser too. If you have a VM with
>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>> over telnet this is what happens:
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) help pmemsave
>> pmemsave addr size file -- save to disk physical memory dump starting
>> at 'addr' of size 'size'
>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>> 'pmemsave' has failed: integer is for 32-bit values
>> Try "help pmemsave" for more information
>> (qemu) quit
> Your change to pmemsave's definition in qapi-schema.json is effectively a
> no-op.
>
> Your example shows *HMP* command pmemsave. The definition of an HMP
> command is *independent* of the QMP command. The implementation *uses*
> the QMP command.
>
> QMP pmemsave is defined in qapi-schema.json as
>
> { 'command': 'pmemsave',
> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>
> Its implementation is in cpus.c:
>
> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
> Error **errp)
>
> Note the int64_t size.
>
> HMP pmemsave is defined in hmp-commands.hx as
>
> {
> .name = "pmemsave",
> .args_type = "val:l,size:i,filename:s",
> .params = "addr size file",
> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
> .mhandler.cmd = hmp_pmemsave,
> },
>
> Its implementation is in hmp.c:
>
> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
> {
> uint32_t size = qdict_get_int(qdict, "size");
> const char *filename = qdict_get_str(qdict, "filename");
> uint64_t addr = qdict_get_int(qdict, "val");
> Error *err = NULL;
>
> qmp_pmemsave(addr, size, filename, &err);
> hmp_handle_error(mon, &err);
> }
>
> Note uint32_t size.
>
> Arguably, the QMP size argument should use 'size' (an alias for
> 'uint64'), and the HMP args_type should use 'size:o'.
Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
way pmemsave is implemented. There is a single function, and two points
of entrance, one for HMP and one for QMP. I think pmemacess mimics
pmemsave closely.
However, if one wants to simply dump a memory region, via HMP for human
easy of use/debug/testing purposes, one cannot dump memory regions that
resides higher than 2^32-1
>> With the changes I suggested, the command succeeds
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) help pmemsave
>> pmemsave addr size file -- save to disk physical memory dump starting
>> at 'addr' of size 'size'
>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>> (qemu) quit
>>
>> However I just noticed that the dump is just about 4GB in size, so
>> there might be more changes needed to snapshot all physical memory of
>> a 64 but VM. I did not investigate any further.
>>
>> ls -l /tmp/memorydump
>> -rw-rw-r-- 1 libvirt-qemu kvm 4294967295 Oct 16 08:04 /tmp/memorydump
>>
>>>> hmp-commands.hx and qmp-commands.hx should be edited accordingly. I
>>>> did not make the above pmemsave changes part of my patch.
>>>>
>>>> Let me know if you have any questions,
>>>>
>>>> Valerio
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently.
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
@ 2015-10-19 21:33 ` Eric Blake
2015-10-21 15:11 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Eric Blake @ 2015-10-19 21:33 UTC (permalink / raw)
To: valerio, qemu-devel; +Cc: armbru, ehabkost, lcapitulino
[-- Attachment #1: Type: text/plain, Size: 10877 bytes --]
On 10/15/2015 05:44 PM, valerio@aimale.com wrote:
> From: Valerio Aimale <valerio@aimale.com>
Long subject line, and no message body. Remember, you want the subject
line to be a one-line short summary of 'what', then the commit body
message for 'why', as in:
qmp: add command for libvmi memory introspection
In the past, libvmi was relying on an out-of-tree patch to qemu that
provides a new QMP command pmemaccess. It is now time to make this
command part of qemu.
pmemaccess is used to create a side-channel communication path that can
more effectively be used to query lots of small memory chunks without
the overhead of one QMP command per chunk. ...
>
> ---
You are missing a Signed-off-by: tag. Without that, we cannot take your
patch. But at least we can still review it:
> Makefile.target | 2 +-
> hmp-commands.hx | 14 ++++
> hmp.c | 9 +++
> hmp.h | 1 +
> memory-access.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> memory-access.h | 21 ++++++
> qapi-schema.json | 28 ++++++++
> qmp-commands.hx | 23 +++++++
> 8 files changed, 303 insertions(+), 1 deletion(-)
> create mode 100644 memory-access.c
> create mode 100644 memory-access.h
>
> diff --git a/Makefile.target b/Makefile.target
> index 962d004..940ab51 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER
> #########################################################
> # System emulator target
> ifdef CONFIG_SOFTMMU
> -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
> +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o
This line is now over 80 columns; please wrap.
> obj-y += qtest.o bootdevice.o
In fact, you could have just appended it into this line instead.
> +++ b/hmp-commands.hx
> @@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}.
> ETEXI
>
> {
> + .name = "pmemaccess",
> + .args_type = "path:s",
> + .params = "file",
> + .help = "open A UNIX Socket access to physical memory at 'path'",
s/A/a/
Awkward grammar; better might be:
open a Unix socket at 'path' for use in accessing physical memory
Please also document where the user can find the protocol that will be
used across the side-channel socket thus opened.
> +++ b/memory-access.c
> @@ -0,0 +1,206 @@
> +/*
> + * Access guest physical memory via a domain socket.
> + *
> + * Copyright (C) 2011 Sandia National Laboratories
> + * Original Author: Bryan D. Payne (bdpayne@acm.org)
> + *
> + * Refurbished for modern QEMU by Valerio Aimale (valerio@aimale.com), in 2015
> + */
I would expect at least something under docs/ in addition to this file
(the protocol spoken over the socket should be well-documented, and not
just by reading the source code). Compare with docs/qmp-spec.txt.
> +struct request{
> + uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
> + uint64_t address; // address to read from OR write to
> + uint64_t length; // number of bytes to read OR write
Any particular endianness constraints to worry about?
> +};
> +
> +static uint64_t
> +connection_read_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
> +{
> + hwaddr paddr = (hwaddr) user_paddr;
> + hwaddr len = (hwaddr) user_len;
> + void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
> + if (!guestmem){
Space before {, throughout the patch.
> +static void
> +send_success_ack (int connection_fd)
No space before ( in function usage.
> +{
> + uint8_t success = 1;
Magic number; I'd expect an enum or #defines somewhere.
> + int nbytes = write(connection_fd, &success, 1);
> + if (1 != nbytes){
> + fprintf(stderr, "Qemu pmemaccess: failed to send success ack\n");
Is fprintf() really the best approach for error reporting?
> +static void
> +connection_handler (int connection_fd)
> +{
> + int nbytes;
> + struct request req;
> +
> + while (1){
> + // client request should match the struct request format
We prefer /* */ comments over //.
> + nbytes = read(connection_fd, &req, sizeof(struct request));
> + if (nbytes != sizeof(struct request)){
> + // error
> + continue;
Silently ignoring errors?
> + }
> + else if (req.type == 0){
> + // request to quit, goodbye
> + break;
> + }
> + else if (req.type == 1){
> + // request to read
> + char *buf = malloc(req.length + 1);
Mixing g_malloc/g_free with malloc/free makes it a bit harder to reason
about things; it might be better to use only glib allocation.
> + nbytes = connection_read_memory(req.address, buf, req.length);
> + if (nbytes != req.length){
> + // read failure, return failure message
> + buf[req.length] = 0; // set last byte to 0 for failure
> + nbytes = write(connection_fd, buf, 1);
> + }
> + else{
> + // read success, return bytes
> + buf[req.length] = 1; // set last byte to 1 for success
> + nbytes = write(connection_fd, buf, nbytes + 1);
> + }
> + free(buf);
> + }
> + else if (req.type == 2){
> + // request to write
> + void *write_buf = malloc(req.length);
> + nbytes = read(connection_fd, &write_buf, req.length);
No error checking that the allocation succeeded? How do you protect from
bogus requests?
> + if (nbytes != req.length){
> + // failed reading the message to write
> + send_fail_ack(connection_fd);
> + }
> + else{
} on the same line as else
> + // do the write
> + nbytes = connection_write_memory(req.address, write_buf, req.length);
> + if (nbytes == req.length){
> + send_success_ack(connection_fd);
> + }
> + else{
> + send_fail_ack(connection_fd);
> + }
> + }
> + free(write_buf);
> + }
> + else{
> + // unknown command
> + fprintf(stderr, "Qemu pmemaccess: ignoring unknown command (%d)\n", req.type);
> + char *buf = malloc(1);
> + buf[0] = 0;
> + nbytes = write(connection_fd, buf, 1);
> + free(buf);
> + }
> + }
> +
> + close(connection_fd);
> +}
> +
> +static void *
> +memory_access_thread (void *p)
The most common style in qemu puts return type on the same line as the
function name.
> +{
> + struct sockaddr_un address;
> + int socket_fd, connection_fd;
> + socklen_t address_length;
> + struct pmemaccess_args *pargs = (struct pmemaccess_args *)p;
This cast is not necessary in C.
> +
> + socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
> + if (socket_fd < 0){
> + error_setg(pargs->errp, "Qemu pmemaccess: socket failed");
TAB damage. Also, mentioning 'Qemu' in an error message is probably
redundant. That, and we prefer 'qemu' over 'Qemu'.
> + goto error_exit;
> + }
> + unlink(pargs->path);
No check for errors?
> + address.sun_family = AF_UNIX;
> + address_length = sizeof(address.sun_family) + sprintf(address.sun_path, "%s", (char *) pargs->path);
Long line. sprintf() is dangerous if you are not positive that
pargs->path fits. Why do you need the cast?
> +
> + if (bind(socket_fd, (struct sockaddr *) &address, address_length) != 0){
> + error_setg(pargs->errp, "Qemu pmemaccess: bind failed");
More TAB damage. Please read
http://wiki.qemu.org/Contribute/SubmitAPatch for more contribution
hints, and be sure to run ./scripts/checkpatch.pl.
> +void
> +qmp_pmemaccess (const char *path, Error **errp)
> +{
> + pthread_t thread;
> + sigset_t set, oldset;
> + struct pmemaccess_args *pargs;
> +
> + // create the args struct
> + pargs = (struct pmemaccess_args *) malloc(sizeof(struct pmemaccess_args));
> + pargs->errp = errp;
> + // create a copy of path that we can safely use
> + pargs->path = malloc(strlen(path) + 1);
> + memcpy(pargs->path, path, strlen(path) + 1);
g_strdup() is your friend.
> +++ b/qapi-schema.json
> @@ -1427,6 +1427,34 @@
> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>
> ##
> +# @pmemaccess:
> +#
> +# Open A UNIX Socket access to physical memory
s/A UNIX Socket/a Unix socket/
Same wording suggestion as earlier; might be better as:
Open a Unix socket used as a side-channel for efficiently accessing
physical memory.
> +#
> +# @path: the name of the UNIX socket pipe
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 2.4.0.1
2.5, not 2.4.0.1.
> +#
> +# Notes: Derived from previously existing patches.
Dead sentence that doesn't add anything to the current specification.
> When command
> +# succeeds connect to the open socket. Write a binary structure to
> +# the socket as:
> +#
> +# struct request {
> +# uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
> +# uint64_t address; // address to read from OR write to
> +# uint64_t length; // number of bytes to read OR write
> +# };
> +#
> +# If it is a read operation, Qemu will return lenght+1 bytes. Read lenght+1
s/lenght/length/ twice
> +# bytes. the last byte will be a 1 for success, or a 0 for failure.
> +#
> +##
> +{ 'command': 'pmemaccess',
> + 'data': {'path': 'str'} }
> +
> +##
> # @cont:
> #
> # Resume guest VCPU execution.
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index d2ba800..26e4a51 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -472,6 +472,29 @@ Example:
> EQMP
>
> {
> + .name = "pmemaccess",
> + .args_type = "path:s",
> + .mhandler.cmd_new = qmp_marshal_input_pmemaccess,
> + },
> +
> +SQMP
> +pmemaccess
> +----------
> +
> +Open A UNIX Socket access to physical memory
> +
> +Arguments:
> +
> +- "path": mount point path (json-string)
> +
> +Example:
> +
> +-> { "execute": "pmemaccess",
> + "arguments": { "path": "/tmp/guestname" } }
> +<- { "return": {} }
> +
> +EQMP
> + {
> .name = "inject-nmi",
> .args_type = "",
> .mhandler.cmd_new = qmp_marshal_inject_nmi,
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-19 14:37 ` Valerio Aimale
@ 2015-10-21 10:54 ` Markus Armbruster
2015-10-21 15:50 ` Valerio Aimale
` (2 more replies)
0 siblings, 3 replies; 43+ messages in thread
From: Markus Armbruster @ 2015-10-21 10:54 UTC (permalink / raw)
To: Valerio Aimale; +Cc: lcapitulino, qemu-devel, ehabkost
Valerio Aimale <valerio@aimale.com> writes:
> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>> valerio@aimale.com writes:
>>>>
>>>>> All-
>>>>>
>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>> introspect QEMU/KVM VMs.
>>>>>
>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>
>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>> a UNIX socket and spawn a listening thread.
>>>>>
>>>>> The client writes binary commands to the socket, in the form of a c
>>>>> structure:
>>>>>
>>>>> struct request {
>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>>> uint64_t address; // address to read from OR write to
>>>>> uint64_t length; // number of bytes to read OR write
>>>>> };
>>>>>
>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>
>>>>> The last bytes of a read operation response indicates success (1
>>>>> success, 0 failure). The single byte returned for a write operation
>>>>> indicates same (1 success, 0 failure).
>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>> garbage followed by the "it failed" byte?
>>> Markus, that appear to be the case. However, I did not write the
>>> communication protocol between libvmi and qemu. I'm assuming that the
>>> person that wrote the protocol, did not want to bother with over
>>> complicating things.
>>>
>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>
>>> I'm thinking he assumed reads would be small in size and the price of
>>> reading garbage was less than the price of writing a more complicated
>>> protocol. I can see his point, confronted with the same problem, I
>>> might have done the same.
>> All right, the interface is designed for *small* memory blocks then.
>>
>> Makes me wonder why he needs a separate binary protocol on a separate
>> socket. Small blocks could be done just fine in QMP.
>
> The problem is speed. if one's analyzing the memory space of a running
> process (physical and paged), libvmi will make a large number of small
> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
> quite significant. xp has overhead due to encoding, and pmemsave has
> overhead due to file open/write (server), file open/read/close/unlink
> (client).
>
> Others have gone through the problem before me. It appears that
> pmemsave and xp are significantly slower than reading memory using a
> socket via pmemaccess.
That they're slower isn't surprising, but I'd expect the cost of
encoding a small block to be insiginificant compared to the cost of the
network roundtrips.
As block size increases, the space overhead of encoding will eventually
bite. But for that usage, the binary protocol appears ill-suited,
unless the client can pretty reliably avoid read failure. I haven't
examined its failure modes, yet.
> The following data is not mine, but it shows the time, in
> milliseconds, required to resolve the content of a paged memory
> address via socket (pmemaccess) , pmemsave and xp
>
> http://cl.ly/image/322a3s0h1V05
>
> Again, I did not produce those data points, they come from an old
> libvmi thread.
90ms is a very long time. What exactly was measured?
> I think it might be conceivable that there could be a QMP command that
> returns the content of an arbitrarily size memory region as a base64
> or a base85 json string. It would still have both time- (due to
> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
> overhead, + json encoding/decoding overhead. It might still be the
> case that socket would outperform such a command as well,
> speed-vise. I don't think it would be any faster than xp.
A special-purpose binary protocol over a dedicated socket will always do
less than a QMP solution (ignoring foolishness like transmitting crap on
read error the client is then expected to throw away). The question is
whether the difference in work translates to a worthwhile difference in
performance.
The larger question is actually whether we have an existing interface
that can serve the libvmi's needs. We've discussed monitor commands
like xp, pmemsave, pmemread. There's another existing interface: the
GDB stub. Have you considered it?
> There's also a similar patch, floating around the internet, the uses
> shared memory, instead of sockets, as inter-process communication
> between libvmi and QEMU. I've never used that.
By the time you built a working IPC mechanism on top of shared memory,
you're often no better off than with AF_LOCAL sockets.
Crazy idea: can we allocate guest memory in a way that support sharing
it with another process? Eduardo, can -mem-path do such wild things?
>>>>> The socket API was written by the libvmi author and it works the with
>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>
>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>
>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>> point release.
>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>
>>>> Can you explain why these won't do for your use case?
>>> For people who do security analysis there are two use cases, static
>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>> can do static analysis. I.e. snapshotting a VM and see what was
>>> happening at that point in time.
>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>
>>> If you take a snapshot of two people exchanging a glass of water, and
>>> you happen to take it at the very moment both persons have their hands
>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>> have a movie of the same scene, it's obvious who's the giver and who's
>>> the receiver. Same use case.
>> I understand the need for introspecting a running guest. What exactly
>> makes the existing commands unsuitable for that?
> Speed. See discussion above.
>>
>>> More to the point, there's a host of C and python frameworks to
>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>> build on top of libvmi. I did not want to reinvent the wheel.
>> Fair enough.
>>
>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>> memory files." What exactly is missing for KVM?
> When they say they support kvm, what they really mean they support the
> (retired, I understand) qemu-kvm fork via a patch that is provided in
> the libvmi source tree. I think the most recent qem-kvm supported is
> 1.6.0
>
> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>
> I wanted to bring support to the head revision of QEMU, to bring
> libvmi level with more modern QEMU.
>
> Maybe the solution is simply to put this patch in the libvmi source
> tree, which I've already asked to do via pull request, leaving QEMU
> alone.
> However, the patch has to be updated at every QEMU point release. I
> wanted to avoid that, if at all possible.
>
>>
>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>> contend that xen has better introspection support. In my case, I did
>>> not want to bother with dedicating a full server to be a xen domain
>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>> an otherwise purposed server.
>> I'm not at all against better introspection support in QEMU. I'm just
>> trying to understand the problem you're trying to solve with your
>> patches.
> What all users of libvmi would love to have is super high speed access
> to VM physical memory as part of the QEMU source tree, and not
> supported via a patch. Implemented as the QEMU owners see fit, as long
> as it is blazing fast and easy accessed via client library or
> inter-process communication.
The use case makes sense to me, we just need to figure out how we want
to serve it in QEMU.
> My gut feeling is that it has to bypass QMP protocol/encoding/file
> access/json to be fast, but, it is just a gut feeling - worth nothing.
My gut feeling is that QMP should do fine in overhead compared to other
solutions involving socket I/O as long as the data sizes are *small*.
Latency might be an issue, though: QMP commands are processed from the
main loop. A dedicated server thread can be more responsive, but
letting it write to shared resources could be "interesting".
>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>> 64bit VM's
>>>>>
>>>>> in qapi-schema.json
>>>>>
>>>>> from
>>>>>
>>>>> ---
>>>>> { 'command': 'pmemsave',
>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>> ---
>>>>>
>>>>> to
>>>>>
>>>>> ---
>>>>> { 'command': 'pmemsave',
>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>> ---
>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>>> confusing.
>>> I think it's confusing for the HMP parser too. If you have a VM with
>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>> over telnet this is what happens:
>>>
>>> $ telnet localhost 1234
>>> Trying 127.0.0.1...
>>> Connected to localhost.
>>> Escape character is '^]'.
>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>> (qemu) help pmemsave
>>> pmemsave addr size file -- save to disk physical memory dump starting
>>> at 'addr' of size 'size'
>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>> 'pmemsave' has failed: integer is for 32-bit values
>>> Try "help pmemsave" for more information
>>> (qemu) quit
>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>> no-op.
>>
>> Your example shows *HMP* command pmemsave. The definition of an HMP
>> command is *independent* of the QMP command. The implementation *uses*
>> the QMP command.
>>
>> QMP pmemsave is defined in qapi-schema.json as
>>
>> { 'command': 'pmemsave',
>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>
>> Its implementation is in cpus.c:
>>
>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>> Error **errp)
>>
>> Note the int64_t size.
>>
>> HMP pmemsave is defined in hmp-commands.hx as
>>
>> {
>> .name = "pmemsave",
>> .args_type = "val:l,size:i,filename:s",
>> .params = "addr size file",
>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
>> .mhandler.cmd = hmp_pmemsave,
>> },
>>
>> Its implementation is in hmp.c:
>>
>> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>> {
>> uint32_t size = qdict_get_int(qdict, "size");
>> const char *filename = qdict_get_str(qdict, "filename");
>> uint64_t addr = qdict_get_int(qdict, "val");
>> Error *err = NULL;
>>
>> qmp_pmemsave(addr, size, filename, &err);
>> hmp_handle_error(mon, &err);
>> }
>>
>> Note uint32_t size.
>>
>> Arguably, the QMP size argument should use 'size' (an alias for
>> 'uint64'), and the HMP args_type should use 'size:o'.
> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
> way pmemsave is implemented. There is a single function, and two
> points of entrance, one for HMP and one for QMP. I think pmemacess
> mimics pmemsave closely.
>
> However, if one wants to simply dump a memory region, via HMP for
> human easy of use/debug/testing purposes, one cannot dump memory
> regions that resides higher than 2^32-1
Can you give an example?
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently.
2015-10-19 21:33 ` Eric Blake
@ 2015-10-21 15:11 ` Valerio Aimale
0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-21 15:11 UTC (permalink / raw)
To: Eric Blake, qemu-devel; +Cc: armbru, ehabkost, lcapitulino
Eric,
thanks for your comments. I'm going to take the liberty to top posts
some notes.
On grammar awkwardness, indentation, documentation, and coding style. I
agree with you. Mea culpa. I take full responsibility. I was too eager
to submit the patch. I'll be less eager in the future.
If and when we decide that this patch belongs in the QEMU source tree, I
will clean up grammar, documentation and code. However, as per
discussion with Markus, that is still up in the air. So I'll hold of on
those for now.
Below discussions of two issues only, endianness and fprintf.
Valerio
On 10/19/15 3:33 PM, Eric Blake wrote:
> On 10/15/2015 05:44 PM, valerio@aimale.com wrote:
>> From: Valerio Aimale <valerio@aimale.com>
> Long subject line, and no message body. Remember, you want the subject
> line to be a one-line short summary of 'what', then the commit body
> message for 'why', as in:
>
> qmp: add command for libvmi memory introspection
>
> In the past, libvmi was relying on an out-of-tree patch to qemu that
> provides a new QMP command pmemaccess. It is now time to make this
> command part of qemu.
>
> pmemaccess is used to create a side-channel communication path that can
> more effectively be used to query lots of small memory chunks without
> the overhead of one QMP command per chunk. ...
>
>> ---
> You are missing a Signed-off-by: tag. Without that, we cannot take your
> patch. But at least we can still review it:
>
>> Makefile.target | 2 +-
>> hmp-commands.hx | 14 ++++
>> hmp.c | 9 +++
>> hmp.h | 1 +
>> memory-access.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> memory-access.h | 21 ++++++
>> qapi-schema.json | 28 ++++++++
>> qmp-commands.hx | 23 +++++++
>> 8 files changed, 303 insertions(+), 1 deletion(-)
>> create mode 100644 memory-access.c
>> create mode 100644 memory-access.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 962d004..940ab51 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER
>> #########################################################
>> # System emulator target
>> ifdef CONFIG_SOFTMMU
>> -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
>> +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o
> This line is now over 80 columns; please wrap.
>
>> obj-y += qtest.o bootdevice.o
> In fact, you could have just appended it into this line instead.
>
>> +++ b/hmp-commands.hx
>> @@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}.
>> ETEXI
>>
>> {
>> + .name = "pmemaccess",
>> + .args_type = "path:s",
>> + .params = "file",
>> + .help = "open A UNIX Socket access to physical memory at 'path'",
> s/A/a/
>
> Awkward grammar; better might be:
>
> open a Unix socket at 'path' for use in accessing physical memory
>
> Please also document where the user can find the protocol that will be
> used across the side-channel socket thus opened.
>
>
>> +++ b/memory-access.c
>> @@ -0,0 +1,206 @@
>> +/*
>> + * Access guest physical memory via a domain socket.
>> + *
>> + * Copyright (C) 2011 Sandia National Laboratories
>> + * Original Author: Bryan D. Payne (bdpayne@acm.org)
>> + *
>> + * Refurbished for modern QEMU by Valerio Aimale (valerio@aimale.com), in 2015
>> + */
> I would expect at least something under docs/ in addition to this file
> (the protocol spoken over the socket should be well-documented, and not
> just by reading the source code). Compare with docs/qmp-spec.txt.
>
>> +struct request{
>> + uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>> + uint64_t address; // address to read from OR write to
>> + uint64_t length; // number of bytes to read OR write
> Any particular endianness constraints to worry about?
That is a very interesting and insightful comment, that required some
thinking. As I see it right now, the issue of endianness can be
partitioned in 3 separate problems:
1) Endinanness concordance between libvmi and QEM host. As this patch
uses a UNIX socket for inter-process communication, it implicitely
assumes that libvmi and the QEMU host will run on the same machine, thus
will have the same architecture, no need to endianness correction. If
the patch were to use an inet socket, then hton and ntoh conversion
would be required. It could be easily arranged by using this very useful
header https://gist.github.com/panzi/6856583 , that provides
architecture and platform independent implementation of htobe64() and
be64toh() that are required to convert the two 64-bit members of the
struct request. Of course there is the very interesting and intriguing
scenario of somebody tunneling a UNIX socket from one host to another
via inet socket with socat. However, as libvmi owns socket creation,
that would be, not impossible , but not even that easy.
2) Endinanness concordance between QEM host and QEMU guest. As
pmemaccess just calls cpu_physical_memory_map() and
cpu_physical_memory_unmap(), no need for worrying about concordance
here. Memory regions are treated as byte streams.
3) Endinanness concordance between libvmi and QEMU guest: this is
responsibility of libvmi and should be implemented in libvmi. My
understanding is that, as of now, libvmi supports introspection of
Windows, Linux and Mac OSX running on x86 and x86_64 architectures. I
believe that covers 99.(9) % of the needs of the
security/analysis/introspection community. Soon, when arm64/CortexA53
boards will dominate the server market, as some hypothesize, likely the
need for libvmi endianness concordance will arise, for those
introspecting from a x86/x86_64 to an ARM64 guest . However, that's some
time ahead. I'm still waiting on my order of an ARM64 board. There is
also the corner case of Xbox One introspection. I've never heard of an
Xbox VM running on QEMU, but that's an interesting hypothesis.
>
>> +};
>> +
>> +static uint64_t
>> +connection_read_memory (uint64_t user_paddr, void *buf, uint64_t user_len)
>> +{
>> + hwaddr paddr = (hwaddr) user_paddr;
>> + hwaddr len = (hwaddr) user_len;
>> + void *guestmem = cpu_physical_memory_map(paddr, &len, 0);
>> + if (!guestmem){
> Space before {, throughout the patch.
>
>> +static void
>> +send_success_ack (int connection_fd)
> No space before ( in function usage.
>
>> +{
>> + uint8_t success = 1;
> Magic number; I'd expect an enum or #defines somewhere.
>
>> + int nbytes = write(connection_fd, &success, 1);
>> + if (1 != nbytes){
>> + fprintf(stderr, "Qemu pmemaccess: failed to send success ack\n");
> Is fprintf() really the best approach for error reporting?
No, it is not. However, to my partial defense I will say that I have
reworked the code in the patch to use the Error **errp passed by the
monitor. The original old patches used fprintf. You can see there are
several call to error_setg(pargs->errp, ...) that I used to replace
fprintf's. I missed these two spots. My fault.
>
>
>> +static void
>> +connection_handler (int connection_fd)
>> +{
>> + int nbytes;
>> + struct request req;
>> +
>> + while (1){
>> + // client request should match the struct request format
> We prefer /* */ comments over //.
>
>> + nbytes = read(connection_fd, &req, sizeof(struct request));
>> + if (nbytes != sizeof(struct request)){
>> + // error
>> + continue;
> Silently ignoring errors?
>
>> + }
>> + else if (req.type == 0){
>> + // request to quit, goodbye
>> + break;
>> + }
>> + else if (req.type == 1){
>> + // request to read
>> + char *buf = malloc(req.length + 1);
> Mixing g_malloc/g_free with malloc/free makes it a bit harder to reason
> about things; it might be better to use only glib allocation.
>
>> + nbytes = connection_read_memory(req.address, buf, req.length);
>> + if (nbytes != req.length){
>> + // read failure, return failure message
>> + buf[req.length] = 0; // set last byte to 0 for failure
>> + nbytes = write(connection_fd, buf, 1);
>> + }
>> + else{
>> + // read success, return bytes
>> + buf[req.length] = 1; // set last byte to 1 for success
>> + nbytes = write(connection_fd, buf, nbytes + 1);
>> + }
>> + free(buf);
>> + }
>> + else if (req.type == 2){
>> + // request to write
>> + void *write_buf = malloc(req.length);
>> + nbytes = read(connection_fd, &write_buf, req.length);
> No error checking that the allocation succeeded? How do you protect from
> bogus requests?
>
>> + if (nbytes != req.length){
>> + // failed reading the message to write
>> + send_fail_ack(connection_fd);
>> + }
>> + else{
> } on the same line as else
>
>> + // do the write
>> + nbytes = connection_write_memory(req.address, write_buf, req.length);
>> + if (nbytes == req.length){
>> + send_success_ack(connection_fd);
>> + }
>> + else{
>> + send_fail_ack(connection_fd);
>> + }
>> + }
>> + free(write_buf);
>> + }
>> + else{
>> + // unknown command
>> + fprintf(stderr, "Qemu pmemaccess: ignoring unknown command (%d)\n", req.type);
>> + char *buf = malloc(1);
>> + buf[0] = 0;
>> + nbytes = write(connection_fd, buf, 1);
>> + free(buf);
>> + }
>> + }
>> +
>> + close(connection_fd);
>> +}
>> +
>> +static void *
>> +memory_access_thread (void *p)
> The most common style in qemu puts return type on the same line as the
> function name.
>
>> +{
>> + struct sockaddr_un address;
>> + int socket_fd, connection_fd;
>> + socklen_t address_length;
>> + struct pmemaccess_args *pargs = (struct pmemaccess_args *)p;
> This cast is not necessary in C.
>
>> +
>> + socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
>> + if (socket_fd < 0){
>> + error_setg(pargs->errp, "Qemu pmemaccess: socket failed");
> TAB damage. Also, mentioning 'Qemu' in an error message is probably
> redundant. That, and we prefer 'qemu' over 'Qemu'.
>
>> + goto error_exit;
>> + }
>> + unlink(pargs->path);
> No check for errors?
>
>> + address.sun_family = AF_UNIX;
>> + address_length = sizeof(address.sun_family) + sprintf(address.sun_path, "%s", (char *) pargs->path);
> Long line. sprintf() is dangerous if you are not positive that
> pargs->path fits. Why do you need the cast?
>
>> +
>> + if (bind(socket_fd, (struct sockaddr *) &address, address_length) != 0){
>> + error_setg(pargs->errp, "Qemu pmemaccess: bind failed");
> More TAB damage. Please read
> http://wiki.qemu.org/Contribute/SubmitAPatch for more contribution
> hints, and be sure to run ./scripts/checkpatch.pl.
>
>
>> +void
>> +qmp_pmemaccess (const char *path, Error **errp)
>> +{
>> + pthread_t thread;
>> + sigset_t set, oldset;
>> + struct pmemaccess_args *pargs;
>> +
>> + // create the args struct
>> + pargs = (struct pmemaccess_args *) malloc(sizeof(struct pmemaccess_args));
>> + pargs->errp = errp;
>> + // create a copy of path that we can safely use
>> + pargs->path = malloc(strlen(path) + 1);
>> + memcpy(pargs->path, path, strlen(path) + 1);
> g_strdup() is your friend.
>
>
>> +++ b/qapi-schema.json
>> @@ -1427,6 +1427,34 @@
>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>
>> ##
>> +# @pmemaccess:
>> +#
>> +# Open A UNIX Socket access to physical memory
> s/A UNIX Socket/a Unix socket/
>
> Same wording suggestion as earlier; might be better as:
>
> Open a Unix socket used as a side-channel for efficiently accessing
> physical memory.
>
>> +#
>> +# @path: the name of the UNIX socket pipe
>> +#
>> +# Returns: Nothing on success
>> +#
>> +# Since: 2.4.0.1
> 2.5, not 2.4.0.1.
>
>> +#
>> +# Notes: Derived from previously existing patches.
> Dead sentence that doesn't add anything to the current specification.
>
>> When command
>> +# succeeds connect to the open socket. Write a binary structure to
>> +# the socket as:
>> +#
>> +# struct request {
>> +# uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>> +# uint64_t address; // address to read from OR write to
>> +# uint64_t length; // number of bytes to read OR write
>> +# };
>> +#
>> +# If it is a read operation, Qemu will return lenght+1 bytes. Read lenght+1
> s/lenght/length/ twice
>
>> +# bytes. the last byte will be a 1 for success, or a 0 for failure.
>> +#
>> +##
>> +{ 'command': 'pmemaccess',
>> + 'data': {'path': 'str'} }
>> +
>> +##
>> # @cont:
>> #
>> # Resume guest VCPU execution.
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index d2ba800..26e4a51 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -472,6 +472,29 @@ Example:
>> EQMP
>>
>> {
>> + .name = "pmemaccess",
>> + .args_type = "path:s",
>> + .mhandler.cmd_new = qmp_marshal_input_pmemaccess,
>> + },
>> +
>> +SQMP
>> +pmemaccess
>> +----------
>> +
>> +Open A UNIX Socket access to physical memory
>> +
>> +Arguments:
>> +
>> +- "path": mount point path (json-string)
>> +
>> +Example:
>> +
>> +-> { "execute": "pmemaccess",
>> + "arguments": { "path": "/tmp/guestname" } }
>> +<- { "return": {} }
>> +
>> +EQMP
>> + {
>> .name = "inject-nmi",
>> .args_type = "",
>> .mhandler.cmd_new = qmp_marshal_inject_nmi,
>>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-21 10:54 ` Markus Armbruster
@ 2015-10-21 15:50 ` Valerio Aimale
2015-10-22 11:50 ` Markus Armbruster
2015-10-22 18:43 ` Valerio Aimale
2015-10-22 19:12 ` Eduardo Habkost
2 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-21 15:50 UTC (permalink / raw)
To: Markus Armbruster; +Cc: lcapitulino, qemu-devel, ehabkost
On 10/21/15 4:54 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>> valerio@aimale.com writes:
>>>>>
>>>>>> All-
>>>>>>
>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>> introspect QEMU/KVM VMs.
>>>>>>
>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>
>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>
>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>> structure:
>>>>>>
>>>>>> struct request {
>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>> uint64_t address; // address to read from OR write to
>>>>>> uint64_t length; // number of bytes to read OR write
>>>>>> };
>>>>>>
>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>
>>>>>> The last bytes of a read operation response indicates success (1
>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>> indicates same (1 success, 0 failure).
>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>> garbage followed by the "it failed" byte?
>>>> Markus, that appear to be the case. However, I did not write the
>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>> person that wrote the protocol, did not want to bother with over
>>>> complicating things.
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> I'm thinking he assumed reads would be small in size and the price of
>>>> reading garbage was less than the price of writing a more complicated
>>>> protocol. I can see his point, confronted with the same problem, I
>>>> might have done the same.
>>> All right, the interface is designed for *small* memory blocks then.
>>>
>>> Makes me wonder why he needs a separate binary protocol on a separate
>>> socket. Small blocks could be done just fine in QMP.
>> The problem is speed. if one's analyzing the memory space of a running
>> process (physical and paged), libvmi will make a large number of small
>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>> quite significant. xp has overhead due to encoding, and pmemsave has
>> overhead due to file open/write (server), file open/read/close/unlink
>> (client).
>>
>> Others have gone through the problem before me. It appears that
>> pmemsave and xp are significantly slower than reading memory using a
>> socket via pmemaccess.
> That they're slower isn't surprising, but I'd expect the cost of
> encoding a small block to be insiginificant compared to the cost of the
> network roundtrips.
>
> As block size increases, the space overhead of encoding will eventually
> bite. But for that usage, the binary protocol appears ill-suited,
> unless the client can pretty reliably avoid read failure. I haven't
> examined its failure modes, yet.
>
>> The following data is not mine, but it shows the time, in
>> milliseconds, required to resolve the content of a paged memory
>> address via socket (pmemaccess) , pmemsave and xp
>>
>> http://cl.ly/image/322a3s0h1V05
>>
>> Again, I did not produce those data points, they come from an old
>> libvmi thread.
> 90ms is a very long time. What exactly was measured?
That is a fair question to ask. Unfortunately, I extracted that data
plot from an old thread in some libvmi mailing list. I do not have the
data and code that produced it. Sifting through the thread, I can see
the code
was never published. I will take it upon myself to produce code that
compares timing - in a fair fashion - of libvmi doing an atomic
operation and a larger-scale operation (like listing running processes)
via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a version of xp
that returns byte streams of memory regions base64 or base85 encoded in
json strings. I'll publish results and code.
However, given workload and life happening, it will be some time before
I complete that task.
>
>> I think it might be conceivable that there could be a QMP command that
>> returns the content of an arbitrarily size memory region as a base64
>> or a base85 json string. It would still have both time- (due to
>> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
>> overhead, + json encoding/decoding overhead. It might still be the
>> case that socket would outperform such a command as well,
>> speed-vise. I don't think it would be any faster than xp.
> A special-purpose binary protocol over a dedicated socket will always do
> less than a QMP solution (ignoring foolishness like transmitting crap on
> read error the client is then expected to throw away). The question is
> whether the difference in work translates to a worthwhile difference in
> performance.
>
> The larger question is actually whether we have an existing interface
> that can serve the libvmi's needs. We've discussed monitor commands
> like xp, pmemsave, pmemread. There's another existing interface: the
> GDB stub. Have you considered it?
>
>> There's also a similar patch, floating around the internet, the uses
>> shared memory, instead of sockets, as inter-process communication
>> between libvmi and QEMU. I've never used that.
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process? Eduardo, can -mem-path do such wild things?
>
>>>>>> The socket API was written by the libvmi author and it works the with
>>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>>> point release.
>>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>>
>>>>> Can you explain why these won't do for your use case?
>>>> For people who do security analysis there are two use cases, static
>>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>>> can do static analysis. I.e. snapshotting a VM and see what was
>>>> happening at that point in time.
>>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>>
>>>> If you take a snapshot of two people exchanging a glass of water, and
>>>> you happen to take it at the very moment both persons have their hands
>>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>>> have a movie of the same scene, it's obvious who's the giver and who's
>>>> the receiver. Same use case.
>>> I understand the need for introspecting a running guest. What exactly
>>> makes the existing commands unsuitable for that?
>> Speed. See discussion above.
>>>> More to the point, there's a host of C and python frameworks to
>>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>>> build on top of libvmi. I did not want to reinvent the wheel.
>>> Fair enough.
>>>
>>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>>> memory files." What exactly is missing for KVM?
>> When they say they support kvm, what they really mean they support the
>> (retired, I understand) qemu-kvm fork via a patch that is provided in
>> the libvmi source tree. I think the most recent qem-kvm supported is
>> 1.6.0
>>
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> I wanted to bring support to the head revision of QEMU, to bring
>> libvmi level with more modern QEMU.
>>
>> Maybe the solution is simply to put this patch in the libvmi source
>> tree, which I've already asked to do via pull request, leaving QEMU
>> alone.
>> However, the patch has to be updated at every QEMU point release. I
>> wanted to avoid that, if at all possible.
>>
>>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>>> contend that xen has better introspection support. In my case, I did
>>>> not want to bother with dedicating a full server to be a xen domain
>>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>>> an otherwise purposed server.
>>> I'm not at all against better introspection support in QEMU. I'm just
>>> trying to understand the problem you're trying to solve with your
>>> patches.
>> What all users of libvmi would love to have is super high speed access
>> to VM physical memory as part of the QEMU source tree, and not
>> supported via a patch. Implemented as the QEMU owners see fit, as long
>> as it is blazing fast and easy accessed via client library or
>> inter-process communication.
> The use case makes sense to me, we just need to figure out how we want
> to serve it in QEMU.
>
>> My gut feeling is that it has to bypass QMP protocol/encoding/file
>> access/json to be fast, but, it is just a gut feeling - worth nothing.
> My gut feeling is that QMP should do fine in overhead compared to other
> solutions involving socket I/O as long as the data sizes are *small*.
> Latency might be an issue, though: QMP commands are processed from the
> main loop. A dedicated server thread can be more responsive, but
> letting it write to shared resources could be "interesting".
See above, I'll produce some code and results so we can all think about
the issue.
>
>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>> 64bit VM's
>>>>>>
>>>>>> in qapi-schema.json
>>>>>>
>>>>>> from
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>> ---
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>> ---
>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>>>> confusing.
>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>> over telnet this is what happens:
>>>>
>>>> $ telnet localhost 1234
>>>> Trying 127.0.0.1...
>>>> Connected to localhost.
>>>> Escape character is '^]'.
>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>> (qemu) help pmemsave
>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>> at 'addr' of size 'size'
>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>> Try "help pmemsave" for more information
>>>> (qemu) quit
>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>> no-op.
>>>
>>> Your example shows *HMP* command pmemsave. The definition of an HMP
>>> command is *independent* of the QMP command. The implementation *uses*
>>> the QMP command.
>>>
>>> QMP pmemsave is defined in qapi-schema.json as
>>>
>>> { 'command': 'pmemsave',
>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>
>>> Its implementation is in cpus.c:
>>>
>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>> Error **errp)
>>>
>>> Note the int64_t size.
>>>
>>> HMP pmemsave is defined in hmp-commands.hx as
>>>
>>> {
>>> .name = "pmemsave",
>>> .args_type = "val:l,size:i,filename:s",
>>> .params = "addr size file",
>>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>> .mhandler.cmd = hmp_pmemsave,
>>> },
>>>
>>> Its implementation is in hmp.c:
>>>
>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>> {
>>> uint32_t size = qdict_get_int(qdict, "size");
>>> const char *filename = qdict_get_str(qdict, "filename");
>>> uint64_t addr = qdict_get_int(qdict, "val");
>>> Error *err = NULL;
>>>
>>> qmp_pmemsave(addr, size, filename, &err);
>>> hmp_handle_error(mon, &err);
>>> }
>>>
>>> Note uint32_t size.
>>>
>>> Arguably, the QMP size argument should use 'size' (an alias for
>>> 'uint64'), and the HMP args_type should use 'size:o'.
>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>> way pmemsave is implemented. There is a single function, and two
>> points of entrance, one for HMP and one for QMP. I think pmemacess
>> mimics pmemsave closely.
>>
>> However, if one wants to simply dump a memory region, via HMP for
>> human easy of use/debug/testing purposes, one cannot dump memory
>> regions that resides higher than 2^32-1
> Can you give an example?
Yes. I was trying to dump the full extent of physical memory of a VM
that has 8GB memory space (ballooned). I simply did this:
$ telnet localhost 1234
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
QEMU 2.4.0.1 monitor - type 'help' for more information
(qemu) pmemsave 0 8589934591 "/tmp/memsaved"
'pmemsave' has failed: integer is for 32-bit values
Maybe I misunderstood how pmemsave works. Maybe I should have used
dump-guest-memory
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-21 15:50 ` Valerio Aimale
@ 2015-10-22 11:50 ` Markus Armbruster
2015-10-22 18:11 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-22 11:50 UTC (permalink / raw)
To: Valerio Aimale; +Cc: qemu-devel, ehabkost, lcapitulino
Valerio Aimale <valerio@aimale.com> writes:
> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>
>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>> valerio@aimale.com writes:
>>>>>>
>>>>>>> All-
>>>>>>>
>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>
>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>
>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>
>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>> structure:
>>>>>>>
>>>>>>> struct request {
>>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>> uint64_t address; // address to read from OR write to
>>>>>>> uint64_t length; // number of bytes to read OR write
>>>>>>> };
>>>>>>>
>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>
>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>> indicates same (1 success, 0 failure).
>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>> garbage followed by the "it failed" byte?
>>>>> Markus, that appear to be the case. However, I did not write the
>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>> person that wrote the protocol, did not want to bother with over
>>>>> complicating things.
>>>>>
>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>
>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>> reading garbage was less than the price of writing a more complicated
>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>> might have done the same.
>>>> All right, the interface is designed for *small* memory blocks then.
>>>>
>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>> socket. Small blocks could be done just fine in QMP.
>>> The problem is speed. if one's analyzing the memory space of a running
>>> process (physical and paged), libvmi will make a large number of small
>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>> overhead due to file open/write (server), file open/read/close/unlink
>>> (client).
>>>
>>> Others have gone through the problem before me. It appears that
>>> pmemsave and xp are significantly slower than reading memory using a
>>> socket via pmemaccess.
>> That they're slower isn't surprising, but I'd expect the cost of
>> encoding a small block to be insiginificant compared to the cost of the
>> network roundtrips.
>>
>> As block size increases, the space overhead of encoding will eventually
>> bite. But for that usage, the binary protocol appears ill-suited,
>> unless the client can pretty reliably avoid read failure. I haven't
>> examined its failure modes, yet.
>>
>>> The following data is not mine, but it shows the time, in
>>> milliseconds, required to resolve the content of a paged memory
>>> address via socket (pmemaccess) , pmemsave and xp
>>>
>>> http://cl.ly/image/322a3s0h1V05
>>>
>>> Again, I did not produce those data points, they come from an old
>>> libvmi thread.
>> 90ms is a very long time. What exactly was measured?
> That is a fair question to ask. Unfortunately, I extracted that data
> plot from an old thread in some libvmi mailing list. I do not have the
> data and code that produced it. Sifting through the thread, I can see
> the code
> was never published. I will take it upon myself to produce code that
> compares timing - in a fair fashion - of libvmi doing an atomic
> operation and a larger-scale operation (like listing running
> processes) via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
> version of xp that returns byte streams of memory regions base64 or
> base85 encoded in json strings. I'll publish results and code.
>
> However, given workload and life happening, it will be some time
> before I complete that task.
No problem. I'd like to have your use case addressed, but there's no
need for haste.
[...]
>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>> 64bit VM's
>>>>>>>
>>>>>>> in qapi-schema.json
>>>>>>>
>>>>>>> from
>>>>>>>
>>>>>>> ---
>>>>>>> { 'command': 'pmemsave',
>>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>> ---
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> ---
>>>>>>> { 'command': 'pmemsave',
>>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>> ---
>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>>>>> confusing.
>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>> over telnet this is what happens:
>>>>>
>>>>> $ telnet localhost 1234
>>>>> Trying 127.0.0.1...
>>>>> Connected to localhost.
>>>>> Escape character is '^]'.
>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>> (qemu) help pmemsave
>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>> at 'addr' of size 'size'
>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>> Try "help pmemsave" for more information
>>>>> (qemu) quit
>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>> no-op.
>>>>
>>>> Your example shows *HMP* command pmemsave. The definition of an HMP
>>>> command is *independent* of the QMP command. The implementation *uses*
>>>> the QMP command.
>>>>
>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>
>>>> { 'command': 'pmemsave',
>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>
>>>> Its implementation is in cpus.c:
>>>>
>>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>> Error **errp)
>>>>
>>>> Note the int64_t size.
>>>>
>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>
>>>> {
>>>> .name = "pmemsave",
>>>> .args_type = "val:l,size:i,filename:s",
>>>> .params = "addr size file",
>>>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>> .mhandler.cmd = hmp_pmemsave,
>>>> },
>>>>
>>>> Its implementation is in hmp.c:
>>>>
>>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>> {
>>>> uint32_t size = qdict_get_int(qdict, "size");
>>>> const char *filename = qdict_get_str(qdict, "filename");
>>>> uint64_t addr = qdict_get_int(qdict, "val");
>>>> Error *err = NULL;
>>>>
>>>> qmp_pmemsave(addr, size, filename, &err);
>>>> hmp_handle_error(mon, &err);
>>>> }
>>>>
>>>> Note uint32_t size.
>>>>
>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>> way pmemsave is implemented. There is a single function, and two
>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>> mimics pmemsave closely.
>>>
>>> However, if one wants to simply dump a memory region, via HMP for
>>> human easy of use/debug/testing purposes, one cannot dump memory
>>> regions that resides higher than 2^32-1
>> Can you give an example?
> Yes. I was trying to dump the full extent of physical memory of a VM
> that has 8GB memory space (ballooned). I simply did this:
>
> $ telnet localhost 1234
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> QEMU 2.4.0.1 monitor - type 'help' for more information
> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
> 'pmemsave' has failed: integer is for 32-bit values
>
> Maybe I misunderstood how pmemsave works. Maybe I should have used
> dump-guest-memory
This is am unnecessary limitation caused by 'size:i' instead of
'size:o'. Fixable.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 11:50 ` Markus Armbruster
@ 2015-10-22 18:11 ` Valerio Aimale
2015-10-23 6:31 ` Markus Armbruster
0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 18:11 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, ehabkost, lcapitulino
On 10/22/15 5:50 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>>
>>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>>> valerio@aimale.com writes:
>>>>>>>
>>>>>>>> All-
>>>>>>>>
>>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>>
>>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>>
>>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>>
>>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>>> structure:
>>>>>>>>
>>>>>>>> struct request {
>>>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>>> uint64_t address; // address to read from OR write to
>>>>>>>> uint64_t length; // number of bytes to read OR write
>>>>>>>> };
>>>>>>>>
>>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>>
>>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>>> indicates same (1 success, 0 failure).
>>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>>> garbage followed by the "it failed" byte?
>>>>>> Markus, that appear to be the case. However, I did not write the
>>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>>> person that wrote the protocol, did not want to bother with over
>>>>>> complicating things.
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>>> reading garbage was less than the price of writing a more complicated
>>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>>> might have done the same.
>>>>> All right, the interface is designed for *small* memory blocks then.
>>>>>
>>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>>> socket. Small blocks could be done just fine in QMP.
>>>> The problem is speed. if one's analyzing the memory space of a running
>>>> process (physical and paged), libvmi will make a large number of small
>>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>>> overhead due to file open/write (server), file open/read/close/unlink
>>>> (client).
>>>>
>>>> Others have gone through the problem before me. It appears that
>>>> pmemsave and xp are significantly slower than reading memory using a
>>>> socket via pmemaccess.
>>> That they're slower isn't surprising, but I'd expect the cost of
>>> encoding a small block to be insiginificant compared to the cost of the
>>> network roundtrips.
>>>
>>> As block size increases, the space overhead of encoding will eventually
>>> bite. But for that usage, the binary protocol appears ill-suited,
>>> unless the client can pretty reliably avoid read failure. I haven't
>>> examined its failure modes, yet.
>>>
>>>> The following data is not mine, but it shows the time, in
>>>> milliseconds, required to resolve the content of a paged memory
>>>> address via socket (pmemaccess) , pmemsave and xp
>>>>
>>>> http://cl.ly/image/322a3s0h1V05
>>>>
>>>> Again, I did not produce those data points, they come from an old
>>>> libvmi thread.
>>> 90ms is a very long time. What exactly was measured?
>> That is a fair question to ask. Unfortunately, I extracted that data
>> plot from an old thread in some libvmi mailing list. I do not have the
>> data and code that produced it. Sifting through the thread, I can see
>> the code
>> was never published. I will take it upon myself to produce code that
>> compares timing - in a fair fashion - of libvmi doing an atomic
>> operation and a larger-scale operation (like listing running
>> processes) via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
>> version of xp that returns byte streams of memory regions base64 or
>> base85 encoded in json strings. I'll publish results and code.
>>
>> However, given workload and life happening, it will be some time
>> before I complete that task.
> No problem. I'd like to have your use case addressed, but there's no
> need for haste.
Thanks, Markus. Appreciate your help.
>
> [...]
>>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>>> 64bit VM's
>>>>>>>>
>>>>>>>> in qapi-schema.json
>>>>>>>>
>>>>>>>> from
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>>>>>> confusing.
>>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>>> over telnet this is what happens:
>>>>>>
>>>>>> $ telnet localhost 1234
>>>>>> Trying 127.0.0.1...
>>>>>> Connected to localhost.
>>>>>> Escape character is '^]'.
>>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>>> (qemu) help pmemsave
>>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>>> at 'addr' of size 'size'
>>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>>> Try "help pmemsave" for more information
>>>>>> (qemu) quit
>>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>>> no-op.
>>>>>
>>>>> Your example shows *HMP* command pmemsave. The definition of an HMP
>>>>> command is *independent* of the QMP command. The implementation *uses*
>>>>> the QMP command.
>>>>>
>>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>>
>>>>> { 'command': 'pmemsave',
>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>
>>>>> Its implementation is in cpus.c:
>>>>>
>>>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>>> Error **errp)
>>>>>
>>>>> Note the int64_t size.
>>>>>
>>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>>
>>>>> {
>>>>> .name = "pmemsave",
>>>>> .args_type = "val:l,size:i,filename:s",
>>>>> .params = "addr size file",
>>>>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>>> .mhandler.cmd = hmp_pmemsave,
>>>>> },
>>>>>
>>>>> Its implementation is in hmp.c:
>>>>>
>>>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>>> {
>>>>> uint32_t size = qdict_get_int(qdict, "size");
>>>>> const char *filename = qdict_get_str(qdict, "filename");
>>>>> uint64_t addr = qdict_get_int(qdict, "val");
>>>>> Error *err = NULL;
>>>>>
>>>>> qmp_pmemsave(addr, size, filename, &err);
>>>>> hmp_handle_error(mon, &err);
>>>>> }
>>>>>
>>>>> Note uint32_t size.
>>>>>
>>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>>> way pmemsave is implemented. There is a single function, and two
>>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>>> mimics pmemsave closely.
>>>>
>>>> However, if one wants to simply dump a memory region, via HMP for
>>>> human easy of use/debug/testing purposes, one cannot dump memory
>>>> regions that resides higher than 2^32-1
>>> Can you give an example?
>> Yes. I was trying to dump the full extent of physical memory of a VM
>> that has 8GB memory space (ballooned). I simply did this:
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
>> 'pmemsave' has failed: integer is for 32-bit values
>>
>> Maybe I misunderstood how pmemsave works. Maybe I should have used
>> dump-guest-memory
> This is am unnecessary limitation caused by 'size:i' instead of
> 'size:o'. Fixable.
I think I tried changing size:i to size:l, but, I was still receiving
the error.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-21 10:54 ` Markus Armbruster
2015-10-21 15:50 ` Valerio Aimale
@ 2015-10-22 18:43 ` Valerio Aimale
2015-10-22 18:54 ` Eric Blake
2015-10-22 19:12 ` Eduardo Habkost
2 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 18:43 UTC (permalink / raw)
To: Markus Armbruster; +Cc: lcapitulino, qemu-devel, ehabkost
On 10/21/15 4:54 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>> valerio@aimale.com writes:
>>>>>
>>>>>> All-
>>>>>>
>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>> introspect QEMU/KVM VMs.
>>>>>>
>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>
>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>
>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>> structure:
>>>>>>
>>>>>> struct request {
>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>> uint64_t address; // address to read from OR write to
>>>>>> uint64_t length; // number of bytes to read OR write
>>>>>> };
>>>>>>
>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>
>>>>>> The last bytes of a read operation response indicates success (1
>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>> indicates same (1 success, 0 failure).
>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>> garbage followed by the "it failed" byte?
>>>> Markus, that appear to be the case. However, I did not write the
>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>> person that wrote the protocol, did not want to bother with over
>>>> complicating things.
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> I'm thinking he assumed reads would be small in size and the price of
>>>> reading garbage was less than the price of writing a more complicated
>>>> protocol. I can see his point, confronted with the same problem, I
>>>> might have done the same.
>>> All right, the interface is designed for *small* memory blocks then.
>>>
>>> Makes me wonder why he needs a separate binary protocol on a separate
>>> socket. Small blocks could be done just fine in QMP.
>> The problem is speed. if one's analyzing the memory space of a running
>> process (physical and paged), libvmi will make a large number of small
>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>> quite significant. xp has overhead due to encoding, and pmemsave has
>> overhead due to file open/write (server), file open/read/close/unlink
>> (client).
>>
>> Others have gone through the problem before me. It appears that
>> pmemsave and xp are significantly slower than reading memory using a
>> socket via pmemaccess.
> That they're slower isn't surprising, but I'd expect the cost of
> encoding a small block to be insiginificant compared to the cost of the
> network roundtrips.
>
> As block size increases, the space overhead of encoding will eventually
> bite. But for that usage, the binary protocol appears ill-suited,
> unless the client can pretty reliably avoid read failure. I haven't
> examined its failure modes, yet.
>
>> The following data is not mine, but it shows the time, in
>> milliseconds, required to resolve the content of a paged memory
>> address via socket (pmemaccess) , pmemsave and xp
>>
>> http://cl.ly/image/322a3s0h1V05
>>
>> Again, I did not produce those data points, they come from an old
>> libvmi thread.
> 90ms is a very long time. What exactly was measured?
>
>> I think it might be conceivable that there could be a QMP command that
>> returns the content of an arbitrarily size memory region as a base64
>> or a base85 json string. It would still have both time- (due to
>> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
>> overhead, + json encoding/decoding overhead. It might still be the
>> case that socket would outperform such a command as well,
>> speed-vise. I don't think it would be any faster than xp.
> A special-purpose binary protocol over a dedicated socket will always do
> less than a QMP solution (ignoring foolishness like transmitting crap on
> read error the client is then expected to throw away). The question is
> whether the difference in work translates to a worthwhile difference in
> performance.
>
> The larger question is actually whether we have an existing interface
> that can serve the libvmi's needs. We've discussed monitor commands
> like xp, pmemsave, pmemread. There's another existing interface: the
> GDB stub. Have you considered it?
>
>> There's also a similar patch, floating around the internet, the uses
>> shared memory, instead of sockets, as inter-process communication
>> between libvmi and QEMU. I've never used that.
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process? Eduardo, can -mem-path do such wild things?
Markus, your suggestion led to a lightbulb going off in my head.
What if there was a qmp command, say 'pmemmap' then when invoked,
performs the following:
qmp_pmemmap( [...]) {
char *template = "/tmp/QEM_mmap_XXXXXXX";
int mmap_fd;
uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming
VM with 8GB RAM */);
cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr)
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will
discuss write later */);
mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");
mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ |
PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0);
/* etc */
}
pmemmap would return the following json
{
'success' : 'true',
'map_filename' : '/tmp/QEM_mmap_1234567'
}
the qmp client/caller, would then allocate a region of memory equal to
the size of the file; mmap() the file '/tmp/QEM_mmap_1234567' into the
region. It would then have (read, maybe write?) access to the full
extent of the guest memory without making any other qmp call. I think it
would be fast, and with low memory usage, as mmap() is pretty efficient.
Of course, there would be a 'pmemunmap' qmp commands that would perform
the cleanup
/* etc. */
munmap()
cpu_physical_memory_unmap();
/* etc. */
Would that work? Is mapping the full extent of the guest RAM too much to
ask of cpu_physical_memory_rw()?
>
>>>>>> The socket API was written by the libvmi author and it works the with
>>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>>> point release.
>>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>>
>>>>> Can you explain why these won't do for your use case?
>>>> For people who do security analysis there are two use cases, static
>>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>>> can do static analysis. I.e. snapshotting a VM and see what was
>>>> happening at that point in time.
>>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>>
>>>> If you take a snapshot of two people exchanging a glass of water, and
>>>> you happen to take it at the very moment both persons have their hands
>>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>>> have a movie of the same scene, it's obvious who's the giver and who's
>>>> the receiver. Same use case.
>>> I understand the need for introspecting a running guest. What exactly
>>> makes the existing commands unsuitable for that?
>> Speed. See discussion above.
>>>> More to the point, there's a host of C and python frameworks to
>>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>>> build on top of libvmi. I did not want to reinvent the wheel.
>>> Fair enough.
>>>
>>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>>> memory files." What exactly is missing for KVM?
>> When they say they support kvm, what they really mean they support the
>> (retired, I understand) qemu-kvm fork via a patch that is provided in
>> the libvmi source tree. I think the most recent qem-kvm supported is
>> 1.6.0
>>
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> I wanted to bring support to the head revision of QEMU, to bring
>> libvmi level with more modern QEMU.
>>
>> Maybe the solution is simply to put this patch in the libvmi source
>> tree, which I've already asked to do via pull request, leaving QEMU
>> alone.
>> However, the patch has to be updated at every QEMU point release. I
>> wanted to avoid that, if at all possible.
>>
>>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>>> contend that xen has better introspection support. In my case, I did
>>>> not want to bother with dedicating a full server to be a xen domain
>>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>>> an otherwise purposed server.
>>> I'm not at all against better introspection support in QEMU. I'm just
>>> trying to understand the problem you're trying to solve with your
>>> patches.
>> What all users of libvmi would love to have is super high speed access
>> to VM physical memory as part of the QEMU source tree, and not
>> supported via a patch. Implemented as the QEMU owners see fit, as long
>> as it is blazing fast and easy accessed via client library or
>> inter-process communication.
> The use case makes sense to me, we just need to figure out how we want
> to serve it in QEMU.
>
>> My gut feeling is that it has to bypass QMP protocol/encoding/file
>> access/json to be fast, but, it is just a gut feeling - worth nothing.
> My gut feeling is that QMP should do fine in overhead compared to other
> solutions involving socket I/O as long as the data sizes are *small*.
> Latency might be an issue, though: QMP commands are processed from the
> main loop. A dedicated server thread can be more responsive, but
> letting it write to shared resources could be "interesting".
>
>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>> 64bit VM's
>>>>>>
>>>>>> in qapi-schema.json
>>>>>>
>>>>>> from
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>> ---
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>> ---
>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>>>> confusing.
>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>> over telnet this is what happens:
>>>>
>>>> $ telnet localhost 1234
>>>> Trying 127.0.0.1...
>>>> Connected to localhost.
>>>> Escape character is '^]'.
>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>> (qemu) help pmemsave
>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>> at 'addr' of size 'size'
>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>> Try "help pmemsave" for more information
>>>> (qemu) quit
>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>> no-op.
>>>
>>> Your example shows *HMP* command pmemsave. The definition of an HMP
>>> command is *independent* of the QMP command. The implementation *uses*
>>> the QMP command.
>>>
>>> QMP pmemsave is defined in qapi-schema.json as
>>>
>>> { 'command': 'pmemsave',
>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>
>>> Its implementation is in cpus.c:
>>>
>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>> Error **errp)
>>>
>>> Note the int64_t size.
>>>
>>> HMP pmemsave is defined in hmp-commands.hx as
>>>
>>> {
>>> .name = "pmemsave",
>>> .args_type = "val:l,size:i,filename:s",
>>> .params = "addr size file",
>>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>> .mhandler.cmd = hmp_pmemsave,
>>> },
>>>
>>> Its implementation is in hmp.c:
>>>
>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>> {
>>> uint32_t size = qdict_get_int(qdict, "size");
>>> const char *filename = qdict_get_str(qdict, "filename");
>>> uint64_t addr = qdict_get_int(qdict, "val");
>>> Error *err = NULL;
>>>
>>> qmp_pmemsave(addr, size, filename, &err);
>>> hmp_handle_error(mon, &err);
>>> }
>>>
>>> Note uint32_t size.
>>>
>>> Arguably, the QMP size argument should use 'size' (an alias for
>>> 'uint64'), and the HMP args_type should use 'size:o'.
>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>> way pmemsave is implemented. There is a single function, and two
>> points of entrance, one for HMP and one for QMP. I think pmemacess
>> mimics pmemsave closely.
>>
>> However, if one wants to simply dump a memory region, via HMP for
>> human easy of use/debug/testing purposes, one cannot dump memory
>> regions that resides higher than 2^32-1
> Can you give an example?
>
> [...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 18:43 ` Valerio Aimale
@ 2015-10-22 18:54 ` Eric Blake
0 siblings, 0 replies; 43+ messages in thread
From: Eric Blake @ 2015-10-22 18:54 UTC (permalink / raw)
To: Valerio Aimale, Markus Armbruster; +Cc: qemu-devel, ehabkost, lcapitulino
[-- Attachment #1: Type: text/plain, Size: 702 bytes --]
On 10/22/2015 12:43 PM, Valerio Aimale wrote:
>
> What if there was a qmp command, say 'pmemmap' then when invoked,
> performs the following:
>
> qmp_pmemmap( [...]) {
>
> char *template = "/tmp/QEM_mmap_XXXXXXX";
Why not let the caller pass in the file name, rather than opening it
ourselves? But the idea of coordinating a file that both caller and qemu
mmap to the same guest memory view might indeed have merit.
[by the way, it's okay to trim messages to the relevant portions, rather
than making readers scroll through lots of quoted content to find the meat]
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-21 10:54 ` Markus Armbruster
2015-10-21 15:50 ` Valerio Aimale
2015-10-22 18:43 ` Valerio Aimale
@ 2015-10-22 19:12 ` Eduardo Habkost
2015-10-22 19:57 ` Valerio Aimale
2015-10-23 6:35 ` Markus Armbruster
2 siblings, 2 replies; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-22 19:12 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, Valerio Aimale, lcapitulino
On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
[...]
> > There's also a similar patch, floating around the internet, the uses
> > shared memory, instead of sockets, as inter-process communication
> > between libvmi and QEMU. I've never used that.
>
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process? Eduardo, can -mem-path do such wild things?
It can't today, but just because it creates a temporary file inside
mem-path and unlinks it immediately after opening a file descriptor. We
could make memory-backend-file also accept a full filename as argument,
or add a mechanism to let QEMU send the open file descriptor to a QMP
client.
--
Eduardo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 19:12 ` Eduardo Habkost
@ 2015-10-22 19:57 ` Valerio Aimale
2015-10-22 20:03 ` Eric Blake
2015-10-22 21:47 ` Eduardo Habkost
2015-10-23 6:35 ` Markus Armbruster
1 sibling, 2 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 19:57 UTC (permalink / raw)
To: Eduardo Habkost, Markus Armbruster; +Cc: qemu-devel, lcapitulino
On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
> [...]
>>> There's also a similar patch, floating around the internet, the uses
>>> shared memory, instead of sockets, as inter-process communication
>>> between libvmi and QEMU. I've never used that.
>> By the time you built a working IPC mechanism on top of shared memory,
>> you're often no better off than with AF_LOCAL sockets.
>>
>> Crazy idea: can we allocate guest memory in a way that support sharing
>> it with another process? Eduardo, can -mem-path do such wild things?
> It can't today, but just because it creates a temporary file inside
> mem-path and unlinks it immediately after opening a file descriptor. We
> could make memory-backend-file also accept a full filename as argument,
> or add a mechanism to let QEMU send the open file descriptor to a QMP
> client.
>
Eduardo, would my "artisanal" idea of creating an mmap'ed image of the
guest memory footprint work, augmented by Eric's suggestion of having
the qmp client pass the filename?
qmp_pmemmap( [...]) {
char *template = "/tmp/QEM_mmap_XXXXXXX";
int mmap_fd;
uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming
VM with 8GB RAM */);
cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr)
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will
discuss write later */);
mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");
mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ |
PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0);
/* etc */
}
pmemmap would return the following json
{
'success' : 'true',
'map_filename' : '/tmp/QEM_mmap_1234567'
}
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 19:57 ` Valerio Aimale
@ 2015-10-22 20:03 ` Eric Blake
2015-10-22 20:45 ` Valerio Aimale
2015-10-22 21:47 ` Eduardo Habkost
1 sibling, 1 reply; 43+ messages in thread
From: Eric Blake @ 2015-10-22 20:03 UTC (permalink / raw)
To: Valerio Aimale, Eduardo Habkost, Markus Armbruster
Cc: qemu-devel, lcapitulino
[-- Attachment #1: Type: text/plain, Size: 639 bytes --]
On 10/22/2015 01:57 PM, Valerio Aimale wrote:
>
> pmemmap would return the following json
>
> {
> 'success' : 'true',
> 'map_filename' : '/tmp/QEM_mmap_1234567'
> }
In general, it is better if the client controls the filename, and not
qemu. This is because things like libvirt like to run qemu in a
highly-constrained environment, where the caller can pass in a file
descriptor that qemu cannot itself open(). So returning a filename is
pointless if the filename was already provided by the caller.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 20:03 ` Eric Blake
@ 2015-10-22 20:45 ` Valerio Aimale
0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 20:45 UTC (permalink / raw)
To: Eric Blake, Eduardo Habkost, Markus Armbruster; +Cc: qemu-devel, lcapitulino
On 10/22/15 2:03 PM, Eric Blake wrote:
> On 10/22/2015 01:57 PM, Valerio Aimale wrote:
>
>> pmemmap would return the following json
>>
>> {
>> 'success' : 'true',
>> 'map_filename' : '/tmp/QEM_mmap_1234567'
>> }
> In general, it is better if the client controls the filename, and not
> qemu. This is because things like libvirt like to run qemu in a
> highly-constrained environment, where the caller can pass in a file
> descriptor that qemu cannot itself open(). So returning a filename is
> pointless if the filename was already provided by the caller.
>
Eric, I absolutely and positively agree with you. I was just
brainstorming. Consider my pseudo-C code as the mailing list analog of
somebody scribbling on a white board and trying explain an idea.
I agree with you the file name should come from the qmp client.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 19:57 ` Valerio Aimale
2015-10-22 20:03 ` Eric Blake
@ 2015-10-22 21:47 ` Eduardo Habkost
2015-10-22 21:51 ` Valerio Aimale
1 sibling, 1 reply; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-22 21:47 UTC (permalink / raw)
To: Valerio Aimale; +Cc: lcapitulino, Markus Armbruster, qemu-devel
On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> >On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >>Valerio Aimale <valerio@aimale.com> writes:
> >[...]
> >>>There's also a similar patch, floating around the internet, the uses
> >>>shared memory, instead of sockets, as inter-process communication
> >>>between libvmi and QEMU. I've never used that.
> >>By the time you built a working IPC mechanism on top of shared memory,
> >>you're often no better off than with AF_LOCAL sockets.
> >>
> >>Crazy idea: can we allocate guest memory in a way that support sharing
> >>it with another process? Eduardo, can -mem-path do such wild things?
> >It can't today, but just because it creates a temporary file inside
> >mem-path and unlinks it immediately after opening a file descriptor. We
> >could make memory-backend-file also accept a full filename as argument,
> >or add a mechanism to let QEMU send the open file descriptor to a QMP
> >client.
> >
> Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> memory footprint work, augmented by Eric's suggestion of having the qmp
> client pass the filename?
The code below doesn't make sense to me.
>
> qmp_pmemmap( [...]) {
>
> char *template = "/tmp/QEM_mmap_XXXXXXX";
> int mmap_fd;
> uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming VM
> with 8GB RAM */);
>
> cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr)
> 8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will
> discuss write later */);
Are you suggesting copying all the VM RAM contents to a whole new area?
Why?
>
> mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");
>
> mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ |
> PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0);
I don't understand what you are trying to do here.
>
> /* etc */
>
> }
>
> pmemmap would return the following json
>
> {
> 'success' : 'true',
> 'map_filename' : '/tmp/QEM_mmap_1234567'
> }
>
>
--
Eduardo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 21:47 ` Eduardo Habkost
@ 2015-10-22 21:51 ` Valerio Aimale
2015-10-23 8:25 ` Daniel P. Berrange
2015-10-23 18:55 ` Eduardo Habkost
0 siblings, 2 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-22 21:51 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: lcapitulino, Markus Armbruster, qemu-devel
On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
>> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>> Valerio Aimale <valerio@aimale.com> writes:
>>> [...]
>>>>> There's also a similar patch, floating around the internet, the uses
>>>>> shared memory, instead of sockets, as inter-process communication
>>>>> between libvmi and QEMU. I've never used that.
>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>> you're often no better off than with AF_LOCAL sockets.
>>>>
>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>> it with another process? Eduardo, can -mem-path do such wild things?
>>> It can't today, but just because it creates a temporary file inside
>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>> could make memory-backend-file also accept a full filename as argument,
>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>> client.
>>>
>> Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
>> memory footprint work, augmented by Eric's suggestion of having the qmp
>> client pass the filename?
> The code below doesn't make sense to me.
Ok. What I am trying to do is to create a mmapped() memory area of the
guest physical memory that can be shared between QEMU and an external
process, such that the external process can read arbitrary locations of
the qemu guest physical memory.
In short, I'm using mmap MAP_SHARED to share the guest memory area with
a process that is external to QEMU
does it make better sense now?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 18:11 ` Valerio Aimale
@ 2015-10-23 6:31 ` Markus Armbruster
0 siblings, 0 replies; 43+ messages in thread
From: Markus Armbruster @ 2015-10-23 6:31 UTC (permalink / raw)
To: Valerio Aimale; +Cc: lcapitulino, qemu-devel, ehabkost
Valerio Aimale <valerio@aimale.com> writes:
> On 10/22/15 5:50 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/21/15 4:54 AM, Markus Armbruster wrote:
[...]
>>>> Can you give an example?
>>> Yes. I was trying to dump the full extent of physical memory of a VM
>>> that has 8GB memory space (ballooned). I simply did this:
>>>
>>> $ telnet localhost 1234
>>> Trying 127.0.0.1...
>>> Connected to localhost.
>>> Escape character is '^]'.
>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
>>> 'pmemsave' has failed: integer is for 32-bit values
>>>
>>> Maybe I misunderstood how pmemsave works. Maybe I should have used
>>> dump-guest-memory
>> This is am unnecessary limitation caused by 'size:i' instead of
>> 'size:o'. Fixable.
> I think I tried changing size:i to size:l, but, I was still receiving
> the error.
I should have a look. HMP is relatively low priority for me; feel free
to remind me in a week or two.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 19:12 ` Eduardo Habkost
2015-10-22 19:57 ` Valerio Aimale
@ 2015-10-23 6:35 ` Markus Armbruster
2015-10-23 8:18 ` Daniel P. Berrange
` (2 more replies)
1 sibling, 3 replies; 43+ messages in thread
From: Markus Armbruster @ 2015-10-23 6:35 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, Valerio Aimale, lcapitulino
Eduardo Habkost <ehabkost@redhat.com> writes:
> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
> [...]
>> > There's also a similar patch, floating around the internet, the uses
>> > shared memory, instead of sockets, as inter-process communication
>> > between libvmi and QEMU. I've never used that.
>>
>> By the time you built a working IPC mechanism on top of shared memory,
>> you're often no better off than with AF_LOCAL sockets.
>>
>> Crazy idea: can we allocate guest memory in a way that support sharing
>> it with another process? Eduardo, can -mem-path do such wild things?
>
> It can't today, but just because it creates a temporary file inside
> mem-path and unlinks it immediately after opening a file descriptor. We
> could make memory-backend-file also accept a full filename as argument,
> or add a mechanism to let QEMU send the open file descriptor to a QMP
> client.
Valerio, would an command line option to share guest memory suffice, or
does it have to be a monitor command? If the latter, why?
Eduardo, I'm not sure writing to guest memory behind TCG's back will
work. Do you know?
Will it work with KVM?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 6:35 ` Markus Armbruster
@ 2015-10-23 8:18 ` Daniel P. Berrange
2015-10-23 14:48 ` Valerio Aimale
2015-10-23 14:44 ` Valerio Aimale
2015-10-23 19:24 ` Eduardo Habkost
2 siblings, 1 reply; 43+ messages in thread
From: Daniel P. Berrange @ 2015-10-23 8:18 UTC (permalink / raw)
To: Markus Armbruster
Cc: lcapitulino, Eduardo Habkost, Valerio Aimale, qemu-devel
On Fri, Oct 23, 2015 at 08:35:15AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
>
> > On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >> Valerio Aimale <valerio@aimale.com> writes:
> > [...]
> >> > There's also a similar patch, floating around the internet, the uses
> >> > shared memory, instead of sockets, as inter-process communication
> >> > between libvmi and QEMU. I've never used that.
> >>
> >> By the time you built a working IPC mechanism on top of shared memory,
> >> you're often no better off than with AF_LOCAL sockets.
> >>
> >> Crazy idea: can we allocate guest memory in a way that support sharing
> >> it with another process? Eduardo, can -mem-path do such wild things?
> >
> > It can't today, but just because it creates a temporary file inside
> > mem-path and unlinks it immediately after opening a file descriptor. We
> > could make memory-backend-file also accept a full filename as argument,
> > or add a mechanism to let QEMU send the open file descriptor to a QMP
> > client.
>
> Valerio, would an command line option to share guest memory suffice, or
> does it have to be a monitor command? If the latter, why?
IIUC, libvmi wants to be able to connect to arbitrary pre-existing
running KVM instances on the host. As such I think it cannot assume
anything about the way they have been started, so requiring they
be booted with a special command line arg looks impractical for this
scenario.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 21:51 ` Valerio Aimale
@ 2015-10-23 8:25 ` Daniel P. Berrange
2015-10-23 19:00 ` Eduardo Habkost
2015-10-23 18:55 ` Eduardo Habkost
1 sibling, 1 reply; 43+ messages in thread
From: Daniel P. Berrange @ 2015-10-23 8:25 UTC (permalink / raw)
To: Valerio Aimale
Cc: qemu-devel, Markus Armbruster, Eduardo Habkost, lcapitulino
On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> >On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> >>On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> >>>On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >>>>Valerio Aimale <valerio@aimale.com> writes:
> >>>[...]
> >>>>>There's also a similar patch, floating around the internet, the uses
> >>>>>shared memory, instead of sockets, as inter-process communication
> >>>>>between libvmi and QEMU. I've never used that.
> >>>>By the time you built a working IPC mechanism on top of shared memory,
> >>>>you're often no better off than with AF_LOCAL sockets.
> >>>>
> >>>>Crazy idea: can we allocate guest memory in a way that support sharing
> >>>>it with another process? Eduardo, can -mem-path do such wild things?
> >>>It can't today, but just because it creates a temporary file inside
> >>>mem-path and unlinks it immediately after opening a file descriptor. We
> >>>could make memory-backend-file also accept a full filename as argument,
> >>>or add a mechanism to let QEMU send the open file descriptor to a QMP
> >>>client.
> >>>
> >>Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> >>memory footprint work, augmented by Eric's suggestion of having the qmp
> >>client pass the filename?
> >The code below doesn't make sense to me.
>
> Ok. What I am trying to do is to create a mmapped() memory area of the guest
> physical memory that can be shared between QEMU and an external process,
> such that the external process can read arbitrary locations of the qemu
> guest physical memory.
> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
> process that is external to QEMU
I wonder if it is possible for you get access to the guest memory via
sysfs instead, by simply accessing /proc/$PID/mem avoiding the need for
any special help from QEMU. /proc/$PID/maps can be used to find the
offset of the guest memory region too, but looking for the map region
that has the size that matches guest RAM size.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 6:35 ` Markus Armbruster
2015-10-23 8:18 ` Daniel P. Berrange
@ 2015-10-23 14:44 ` Valerio Aimale
2015-10-23 14:56 ` Eric Blake
2015-10-23 19:24 ` Eduardo Habkost
2 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 14:44 UTC (permalink / raw)
To: Markus Armbruster, Eduardo Habkost; +Cc: qemu-devel, lcapitulino
On 10/23/15 12:35 AM, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
>
>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>> [...]
>>>> There's also a similar patch, floating around the internet, the uses
>>>> shared memory, instead of sockets, as inter-process communication
>>>> between libvmi and QEMU. I've never used that.
>>> By the time you built a working IPC mechanism on top of shared memory,
>>> you're often no better off than with AF_LOCAL sockets.
>>>
>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>> it with another process? Eduardo, can -mem-path do such wild things?
>> It can't today, but just because it creates a temporary file inside
>> mem-path and unlinks it immediately after opening a file descriptor. We
>> could make memory-backend-file also accept a full filename as argument,
>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>> client.
> Valerio, would an command line option to share guest memory suffice, or
> does it have to be a monitor command? If the latter, why?
As Daniel points out, later in the thread, libvmi knows about QEMU VMs
only via libvirt/virsh.
Thus, a command line option to share guest memory would work only if
there was an info qmp command, info-guestmaps that would return
something like
{
'enabled' : 'true',
'filename' : '/path/to/the/guest/memory/map'
}
so that libvmi can find the map.
Libvmi dependence on virsh is so strict, that libvmi does not even know
if the QEMU VM has an open qmp unix socket or inet socket, to send
commands through. Thus, libvmi sends qmp commands (to query registers,
as an example) via
virsh qemu-monitor-command Windows10B '{"execute":
"human-monitor-command", "arguments": {"command-line": "info registers"}}'
It would be nice if there was a qmp command to query if there are
qmp/hmp sockets open, info-sockets that would return something like
{
'unix' : {
'enabled': 'true',
'address': '/path/to/socket/'
},
'inet' : {
'enabled': 'true',
'address': '1.2.3.4:5678'
}
}
so that libvmi can find the rendezvous unix/inet address and sends
commands through that. As of now, each qmp commands requires a popen()
that forks virsh, which compounds to the slowness.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 8:18 ` Daniel P. Berrange
@ 2015-10-23 14:48 ` Valerio Aimale
0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 14:48 UTC (permalink / raw)
To: Daniel P. Berrange, Markus Armbruster
Cc: lcapitulino, Eduardo Habkost, qemu-devel
On 10/23/15 2:18 AM, Daniel P. Berrange wrote:
> [...]
> It can't today, but just because it creates a temporary file inside
> mem-path and unlinks it immediately after opening a file descriptor. We
> could make memory-backend-file also accept a full filename as argument,
> or add a mechanism to let QEMU send the open file descriptor to a QMP
> client.
>> Valerio, would an command line option to share guest memory suffice, or
>> does it have to be a monitor command? If the latter, why?
> IIUC, libvmi wants to be able to connect to arbitrary pre-existing
> running KVM instances on the host. As such I think it cannot assume
> anything about the way they have been started, so requiring they
> be booted with a special command line arg looks impractical for this
> scenario.
>
> Regards,
> Daniel
Daniel, you are correct. libvmi knows of QEMU/KVM VMs only via
libvirt/virsh. The scenario would work if libvmi was able to query, via
qmp command, the path of the shared memory map.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 14:44 ` Valerio Aimale
@ 2015-10-23 14:56 ` Eric Blake
2015-10-23 15:03 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Eric Blake @ 2015-10-23 14:56 UTC (permalink / raw)
To: Valerio Aimale, Markus Armbruster, Eduardo Habkost
Cc: qemu-devel, lcapitulino
[-- Attachment #1: Type: text/plain, Size: 1419 bytes --]
On 10/23/2015 08:44 AM, Valerio Aimale wrote:
>
> Libvmi dependence on virsh is so strict, that libvmi does not even know
> if the QEMU VM has an open qmp unix socket or inet socket, to send
> commands through. Thus, libvmi sends qmp commands (to query registers,
> as an example) via
>
> virsh qemu-monitor-command Windows10B '{"execute":
> "human-monitor-command", "arguments": {"command-line": "info registers"}}'
This is an unsupported back door of libvirt; you should really also
consider enhancing libvirt to add a formal API to expose this
information so that you don't have to resort to the monitor back door.
But that's a topic for the libvirt list.
>
> so that libvmi can find the rendezvous unix/inet address and sends
> commands through that. As of now, each qmp commands requires a popen()
> that forks virsh, which compounds to the slowness.
No, don't blame virsh for your slowness. Write your own C program that
links against libvirt.so, and which holds and reuses a persistent
connection, using the same libvirt APIs as would be used by virsh. All
the overhead of spawning a shell to spawn virsh to open a fresh
connection for each command will go away. Any solution that uses
popen() to virsh is _screaming_ to be rewritten to use libvirt.so natively.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 14:56 ` Eric Blake
@ 2015-10-23 15:03 ` Valerio Aimale
0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 15:03 UTC (permalink / raw)
To: Eric Blake, Markus Armbruster, Eduardo Habkost; +Cc: qemu-devel, lcapitulino
On 10/23/15 8:56 AM, Eric Blake wrote:
> On 10/23/2015 08:44 AM, Valerio Aimale wrote:
>
>> Libvmi dependence on virsh is so strict, that libvmi does not even know
>> if the QEMU VM has an open qmp unix socket or inet socket, to send
>> commands through. Thus, libvmi sends qmp commands (to query registers,
>> as an example) via
>>
>> virsh qemu-monitor-command Windows10B '{"execute":
>> "human-monitor-command", "arguments": {"command-line": "info registers"}}'
> This is an unsupported back door of libvirt; you should really also
> consider enhancing libvirt to add a formal API to expose this
> information so that you don't have to resort to the monitor back door.
> But that's a topic for the libvirt list.
>
>> so that libvmi can find the rendezvous unix/inet address and sends
>> commands through that. As of now, each qmp commands requires a popen()
>> that forks virsh, which compounds to the slowness.
> No, don't blame virsh for your slowness. Write your own C program that
> links against libvirt.so, and which holds and reuses a persistent
> connection, using the same libvirt APIs as would be used by virsh. All
> the overhead of spawning a shell to spawn virsh to open a fresh
> connection for each command will go away. Any solution that uses
> popen() to virsh is _screaming_ to be rewritten to use libvirt.so natively.
>
Eric, good points. Libvmi was written quite some time ago; that libvirt
api might not even have been there. When we agree on an access method to
the guest mem, I will rewrite those legacy parts. I'm not the author of
libvmi. I'm just trying to make it better, when accessing QEMU/KVM VMs
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-22 21:51 ` Valerio Aimale
2015-10-23 8:25 ` Daniel P. Berrange
@ 2015-10-23 18:55 ` Eduardo Habkost
2015-10-23 19:08 ` Valerio Aimale
1 sibling, 1 reply; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-23 18:55 UTC (permalink / raw)
To: Valerio Aimale; +Cc: lcapitulino, Markus Armbruster, qemu-devel
On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> >On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> >>On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> >>>On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >>>>Valerio Aimale <valerio@aimale.com> writes:
> >>>[...]
> >>>>>There's also a similar patch, floating around the internet, the uses
> >>>>>shared memory, instead of sockets, as inter-process communication
> >>>>>between libvmi and QEMU. I've never used that.
> >>>>By the time you built a working IPC mechanism on top of shared memory,
> >>>>you're often no better off than with AF_LOCAL sockets.
> >>>>
> >>>>Crazy idea: can we allocate guest memory in a way that support sharing
> >>>>it with another process? Eduardo, can -mem-path do such wild things?
> >>>It can't today, but just because it creates a temporary file inside
> >>>mem-path and unlinks it immediately after opening a file descriptor. We
> >>>could make memory-backend-file also accept a full filename as argument,
> >>>or add a mechanism to let QEMU send the open file descriptor to a QMP
> >>>client.
> >>>
> >>Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> >>memory footprint work, augmented by Eric's suggestion of having the qmp
> >>client pass the filename?
> >The code below doesn't make sense to me.
>
> Ok. What I am trying to do is to create a mmapped() memory area of the guest
> physical memory that can be shared between QEMU and an external process,
> such that the external process can read arbitrary locations of the qemu
> guest physical memory.
> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
> process that is external to QEMU
>
> does it make better sense now?
I think you are confused about what mmap() does. It will create a new
mapping into the process address space, containing the data from an
existing file, not the other way around.
--
Eduardo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 8:25 ` Daniel P. Berrange
@ 2015-10-23 19:00 ` Eduardo Habkost
0 siblings, 0 replies; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-23 19:00 UTC (permalink / raw)
To: Daniel P. Berrange
Cc: qemu-devel, Markus Armbruster, Valerio Aimale, lcapitulino
On Fri, Oct 23, 2015 at 09:25:03AM +0100, Daniel P. Berrange wrote:
> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
> > On 10/22/15 3:47 PM, Eduardo Habkost wrote:
> > >On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
> > >>On 10/22/15 1:12 PM, Eduardo Habkost wrote:
> > >>>On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> > >>>>Valerio Aimale <valerio@aimale.com> writes:
> > >>>[...]
> > >>>>>There's also a similar patch, floating around the internet, the uses
> > >>>>>shared memory, instead of sockets, as inter-process communication
> > >>>>>between libvmi and QEMU. I've never used that.
> > >>>>By the time you built a working IPC mechanism on top of shared memory,
> > >>>>you're often no better off than with AF_LOCAL sockets.
> > >>>>
> > >>>>Crazy idea: can we allocate guest memory in a way that support sharing
> > >>>>it with another process? Eduardo, can -mem-path do such wild things?
> > >>>It can't today, but just because it creates a temporary file inside
> > >>>mem-path and unlinks it immediately after opening a file descriptor. We
> > >>>could make memory-backend-file also accept a full filename as argument,
> > >>>or add a mechanism to let QEMU send the open file descriptor to a QMP
> > >>>client.
> > >>>
> > >>Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
> > >>memory footprint work, augmented by Eric's suggestion of having the qmp
> > >>client pass the filename?
> > >The code below doesn't make sense to me.
> >
> > Ok. What I am trying to do is to create a mmapped() memory area of the guest
> > physical memory that can be shared between QEMU and an external process,
> > such that the external process can read arbitrary locations of the qemu
> > guest physical memory.
> > In short, I'm using mmap MAP_SHARED to share the guest memory area with a
> > process that is external to QEMU
>
> I wonder if it is possible for you get access to the guest memory via
> sysfs instead, by simply accessing /proc/$PID/mem avoiding the need for
> any special help from QEMU. /proc/$PID/maps can be used to find the
> offset of the guest memory region too, but looking for the map region
> that has the size that matches guest RAM size.
If libvmi needs to work with existing VMs (without reconfiguring and
restarting them), this may be the only way we could let it access guest
memory directly.
But I wouldn't trust the method of simply looking for the map region
that has the right size. This would be making many assumptions about how
exactly QEMU allocates guest RAM internally. Also, it would require
libvmi to ptrace-attach to QEMU first.
--
Eduardo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 18:55 ` Eduardo Habkost
@ 2015-10-23 19:08 ` Valerio Aimale
2015-10-26 9:09 ` Markus Armbruster
0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-23 19:08 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: lcapitulino, Markus Armbruster, qemu-devel
On 10/23/15 12:55 PM, Eduardo Habkost wrote:
> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
>> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
>>> On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
>>>> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>>>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>> [...]
>>>>>>> There's also a similar patch, floating around the internet, the uses
>>>>>>> shared memory, instead of sockets, as inter-process communication
>>>>>>> between libvmi and QEMU. I've never used that.
>>>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>>>> you're often no better off than with AF_LOCAL sockets.
>>>>>>
>>>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>>>> it with another process? Eduardo, can -mem-path do such wild things?
>>>>> It can't today, but just because it creates a temporary file inside
>>>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>>>> could make memory-backend-file also accept a full filename as argument,
>>>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>>>> client.
>>>>>
>>>> Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest
>>>> memory footprint work, augmented by Eric's suggestion of having the qmp
>>>> client pass the filename?
>>> The code below doesn't make sense to me.
>> Ok. What I am trying to do is to create a mmapped() memory area of the guest
>> physical memory that can be shared between QEMU and an external process,
>> such that the external process can read arbitrary locations of the qemu
>> guest physical memory.
>> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
>> process that is external to QEMU
>>
>> does it make better sense now?
> I think you are confused about what mmap() does. It will create a new
> mapping into the process address space, containing the data from an
> existing file, not the other way around.
>
Eduardo, I think it would be a common rule of politeness not to pass any
judgement on a person that you don't know, but for some texts in a
mailing list. I think I understand how mmap() works, and very well.
Participating is this discussion has been a struggle for me. For the
good of the libvmi users, I have been trying to ignore the judgements,
the comments and so on. But, alas, I throw my hands up in the air, and I
surrender.
I think libvmi can live, as it has for the past years, by patching the
QEMU source tree on as needed basis, and keeping the patch in the libvmi
source tree, without disturbing any further the QEMU community.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 6:35 ` Markus Armbruster
2015-10-23 8:18 ` Daniel P. Berrange
2015-10-23 14:44 ` Valerio Aimale
@ 2015-10-23 19:24 ` Eduardo Habkost
2015-10-23 20:02 ` Richard Henderson
2015-11-02 12:55 ` Paolo Bonzini
2 siblings, 2 replies; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-23 19:24 UTC (permalink / raw)
To: Markus Armbruster
Cc: Paolo Bonzini, Richard Henderson, qemu-devel, Valerio Aimale,
lcapitulino
On Fri, Oct 23, 2015 at 08:35:15AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
>
> > On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
> >> Valerio Aimale <valerio@aimale.com> writes:
> > [...]
> >> > There's also a similar patch, floating around the internet, the uses
> >> > shared memory, instead of sockets, as inter-process communication
> >> > between libvmi and QEMU. I've never used that.
> >>
> >> By the time you built a working IPC mechanism on top of shared memory,
> >> you're often no better off than with AF_LOCAL sockets.
> >>
> >> Crazy idea: can we allocate guest memory in a way that support sharing
> >> it with another process? Eduardo, can -mem-path do such wild things?
> >
> > It can't today, but just because it creates a temporary file inside
> > mem-path and unlinks it immediately after opening a file descriptor. We
> > could make memory-backend-file also accept a full filename as argument,
> > or add a mechanism to let QEMU send the open file descriptor to a QMP
> > client.
>
> Valerio, would an command line option to share guest memory suffice, or
> does it have to be a monitor command? If the latter, why?
>
> Eduardo, I'm not sure writing to guest memory behind TCG's back will
> work. Do you know?
I don't know. I guess it may possibly surprise TCG depending on how some
operations are implemented, but it sounds unlikely. CCing Richard.
>
> Will it work with KVM?
I always assumed it would work, but CCing Paolo to see if there are any
pitfalls.
--
Eduardo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 19:24 ` Eduardo Habkost
@ 2015-10-23 20:02 ` Richard Henderson
2015-11-02 12:55 ` Paolo Bonzini
1 sibling, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2015-10-23 20:02 UTC (permalink / raw)
To: Eduardo Habkost, Markus Armbruster
Cc: Paolo Bonzini, qemu-devel, Valerio Aimale, lcapitulino
On 10/23/2015 09:24 AM, Eduardo Habkost wrote:
> On Fri, Oct 23, 2015 at 08:35:15AM +0200, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>
>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>> Valerio Aimale <valerio@aimale.com> writes:
>>> [...]
>>>>> There's also a similar patch, floating around the internet, the uses
>>>>> shared memory, instead of sockets, as inter-process communication
>>>>> between libvmi and QEMU. I've never used that.
>>>>
>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>> you're often no better off than with AF_LOCAL sockets.
>>>>
>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>> it with another process? Eduardo, can -mem-path do such wild things?
>>>
>>> It can't today, but just because it creates a temporary file inside
>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>> could make memory-backend-file also accept a full filename as argument,
>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>> client.
>>
>> Valerio, would an command line option to share guest memory suffice, or
>> does it have to be a monitor command? If the latter, why?
>>
>> Eduardo, I'm not sure writing to guest memory behind TCG's back will
>> work. Do you know?
>
> I don't know. I guess it may possibly surprise TCG depending on how some
> operations are implemented, but it sounds unlikely. CCing Richard.
Writing to guest memory will work, in that the guest will see the changes. The
harder part is synchronization. Here you'll face all the same problems that
are currently being addressed in the multi-threaded tcg patch sets.
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 19:08 ` Valerio Aimale
@ 2015-10-26 9:09 ` Markus Armbruster
2015-10-26 17:37 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-26 9:09 UTC (permalink / raw)
To: Valerio Aimale; +Cc: qemu-devel, Eduardo Habkost, lcapitulino
Valerio Aimale <valerio@aimale.com> writes:
> On 10/23/15 12:55 PM, Eduardo Habkost wrote:
>> On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote:
>>> On 10/22/15 3:47 PM, Eduardo Habkost wrote:
>>>> On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote:
>>>>> On 10/22/15 1:12 PM, Eduardo Habkost wrote:
>>>>>> On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote:
>>>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>>> [...]
>>>>>>>> There's also a similar patch, floating around the internet, the uses
>>>>>>>> shared memory, instead of sockets, as inter-process communication
>>>>>>>> between libvmi and QEMU. I've never used that.
>>>>>>> By the time you built a working IPC mechanism on top of shared memory,
>>>>>>> you're often no better off than with AF_LOCAL sockets.
>>>>>>>
>>>>>>> Crazy idea: can we allocate guest memory in a way that support sharing
>>>>>>> it with another process? Eduardo, can -mem-path do such wild things?
>>>>>> It can't today, but just because it creates a temporary file inside
>>>>>> mem-path and unlinks it immediately after opening a file descriptor. We
>>>>>> could make memory-backend-file also accept a full filename as argument,
>>>>>> or add a mechanism to let QEMU send the open file descriptor to a QMP
>>>>>> client.
>>>>>>
>>>>> Eduardo, would my "artisanal" idea of creating an mmap'ed image
>>>>> of the guest
>>>>> memory footprint work, augmented by Eric's suggestion of having the qmp
>>>>> client pass the filename?
>>>> The code below doesn't make sense to me.
>>> Ok. What I am trying to do is to create a mmapped() memory area of the guest
>>> physical memory that can be shared between QEMU and an external process,
>>> such that the external process can read arbitrary locations of the qemu
>>> guest physical memory.
>>> In short, I'm using mmap MAP_SHARED to share the guest memory area with a
>>> process that is external to QEMU
>>>
>>> does it make better sense now?
>> I think you are confused about what mmap() does. It will create a new
>> mapping into the process address space, containing the data from an
>> existing file, not the other way around.
>>
> Eduardo, I think it would be a common rule of politeness not to pass
> any judgement on a person that you don't know, but for some texts in a
> mailing list. I think I understand how mmap() works, and very well.
>
> Participating is this discussion has been a struggle for me. For the
> good of the libvmi users, I have been trying to ignore the judgements,
> the comments and so on. But, alas, I throw my hands up in the air, and
> I surrender.
I'm sorry we exceeded your tolerance for frustration. This mailing list
can be tough. We try to be welcoming (believe it or not), but we too
often fail (okay, that part is easily believable).
To be honest, I had difficulties understanding your explanation, and
ended up guessing. I figure Eduardo did the same, and guessed
incorrectly. There but for the grace of God go I.
> I think libvmi can live, as it has for the past years, by patching the
> QEMU source tree on as needed basis, and keeping the patch in the
> libvmi source tree, without disturbing any further the QEMU community.
I'm sure libvmi can continue to require a patched QEMU, but I'm equally
sure getting its needs satisfied out of the box would be better for all.
To get that done, we need to understand the problem, and map the
solution space.
So let me try to summarize the thread, and what I've learned from it so
far. Valerio, if you could correct misunderstandings, I'd be much
obliged.
LibVMI is a C library with Python bindings that makes it easy to monitor
the low-level details of a running virtual machine by viewing its
memory, trapping on hardware events, and accessing the vCPU registers.
This is called virtual machine introspection.
[Direct quote from http://libvmi.com/]
For that purpose, LibVMI needs (among other things) sufficiently fast
means to read and write guest memory.
Its existing solution is a non-upstream patch that adds a new, simple
protocol for reading and writing physical guest memory over TCP, and
monitor commands to start this service.
This thread is in reply to Valerio's attempt to upstream this patch.
Good move.
The usual questions for feature requests apply:
1. Is this a use case we want to serve?
Unreserved yes. Supporting virtual machine introspection with LibVMI
makes sense.
2. Can it be served by existing guest introspection interfaces?
Not quite clear, yet.
LibVMI interacts with QEMU/KVM virtual machines via libvirt. We want
to be able to start and stop introspecting running virtual machines
managed by libvirt. Rules out solutions that require QEMU to be
started with special command line options. We really want QMP
commands. HMP commands would work, but HMP is not a stable
interface. It's fine for prototyping, of course.
Interfaces discussed include the following monitor commands:
* x, xp
There's overhead for encoding / decoding. LibVMI developers
apparently have found it too slow. They measured 90ms, which very
slow indeed. Turns out LibVMI goes through virsh every time! I
suspect the overhead starting virsh dwarves everything else several
times over. Therefore, we don't have an idea on this method's real
overhead, yet.
Transferring large blocks through the monitor connection can be
problematic. The monitor is control plane, bulk data should go
through a data plane. Not sure whether this is an issue for
LibVMI's usage.
x and xp are only in HMP.
They cover only reading.
* memsave, pmemsave
Similar to x, xp, except here we use the file system as data plane.
Additionally, we trade encoding / decoding overhead for temporary
file handling overhead.
Cover only reading.
* dump-guest-memory
More powerful (and more complex) than memsave, pmemsave, but geared
towards a different use case: taking a crash dump of a running VM
for static analysis.
Covers only reading.
Dumps in ELF or compressed kdump format.
Supports dump to file descriptor, which could perhaps be used to
avoid temporary files.
* gdbserver
This starts a server for the GDB Remote Serial Protocol on a TCP
port. This is for debugging the *guest*, not QEMU. Can do much
more than read and write memory.
README.rst in the LibVMI source tree suggests
- LibVMI can already use this interface, but it's apparently slower
than using its non-upstream patch. How much?
- It requires the user to configure his VMs to be started with QEMU
command line option -s, which means you can't introspect a
running VM you didn't prepare for it from the start. Easy enough
to improve: use the monitor command!
gdbserver is only in HMP, probably because it's viewed as a
debugging tool for human use. LibVMI would be a programmatic user,
so adding gdbserver to QMP could be justified.
3. Assuming existing interfaces won't do, how could a new one look like?
* Special purpose TCP protocol, QMP command to start/stop the service
This is the existing non-upstream solution.
We can quibble over the protocol, e.g. its weird handling of read
errors, but that's detail
* Share guest memory with LibVMI somehow
Still in the "crazy idea" stage. Fish the memory out of
/proc/$PID/mem? Create a file QEMU and LibVMI can mmap?
Writing has same synchronization problems as with multi-threaded
TCG. These are being addressed, but I have now idea how the
solutions will translate.
I'm afraid this got a bit long. Thanks for reading this far.
I hope we can work together and find a solution that satisfies LibVMI.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-26 9:09 ` Markus Armbruster
@ 2015-10-26 17:37 ` Valerio Aimale
2015-10-26 17:52 ` Eduardo Habkost
0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-26 17:37 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, Eduardo Habkost, lcapitulino
On 10/26/15 3:09 AM, Markus Armbruster wrote:
> [...]
>
>> Eduardo, I think it would be a common rule of politeness not to pass
>> any judgement on a person that you don't know, but for some texts in a
>> mailing list. I think I understand how mmap() works, and very well.
>>
>> Participating is this discussion has been a struggle for me. For the
>> good of the libvmi users, I have been trying to ignore the judgements,
>> the comments and so on. But, alas, I throw my hands up in the air, and
>> I surrender.
> I'm sorry we exceeded your tolerance for frustration. This mailing list
> can be tough. We try to be welcoming (believe it or not), but we too
> often fail (okay, that part is easily believable).
>
> To be honest, I had difficulties understanding your explanation, and
> ended up guessing. I figure Eduardo did the same, and guessed
> incorrectly. There but for the grace of God go I.
Well, I did scribble my C sample excerpt too fast. Participating in
mailing list is not part of my job description - I was short on time, I
admit to that. However, there is a big difference in saying "I do no
understand your explanation, please try again" and saying "you're
confused about mmap()"
I was trying to advocate the use of a shared mmap'ed region. The sharing
would be two-ways (RW for both) between the QEMU virtualizer and the
libvmi process. I envision that there could be a QEMU command line
argument, such as "--mmap-guest-memory <filename>" Understand that Eric
feels strongly the libvmi client should own the file name - I have not
forgotten that. When that command line argument is given, as part of the
guest initialization, QEMU creates a file of size equal to the size of
the guest memory containing all zeros, mmaps that file to the guest
memory with PROT_READ|PROT_WRITE and MAP_FILE|MAP_SHARED, then starts
the guest. And, if at all possible, makes the filename querable via qmp
and/or hmp, so that the filename of the mmap would not need to be
maintained in two different places, leading to maintenance nightmares.
Shared mmaped regions can be used as inter-process communication, here's
a quick and dirty example:
p1.c
---
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <errno.h>
#include <signal.h>
void handle_signal(int signal);
/* sorry, for ease of development I need these to be global */
int fh;
char *p;
void handle_signal(int signal) {
const char *signal_name;
sigset_t pending;
switch (signal) {
case SIGHUP:
signal_name = "SIGHUP";
fprintf(stdout, "Process 1 -- Map now contains: %s\n", p);
munmap(p, sysconf(_SC_PAGE_SIZE) );
close(fh);
exit(0);
break;
default:
fprintf(stderr, "Caught wrong signal: %d\n", signal);
return;
}
}
void main(int argc, char **argv) {
struct sigaction sa;
sa.sa_handler = &handle_signal;
sa.sa_flags = SA_RESTART;
sigfillset(&sa.sa_mask);
if (sigaction(SIGHUP, &sa, NULL) == -1) {
perror("Error: cannot handle SIGHUP");
exit(1);
}
if ( (fh = open("shared.map", O_RDWR | O_CREAT, S_IRWXU)) ) {
p = mmap(NULL, sysconf(_SC_PAGE_SIZE),
PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED, fh, (off_t) 0);
if (p == MAP_FAILED) { printf("poop, didn't map:
%s\n",strerror(errno)); close(fh); exit(1);}
p[0] = 0xcc;
fprintf(stdout, "Process 1 -- Writing to map: All
your bases are belong to us.\n");
sprintf( (char*) p, "All your bases are belong to us.");
while(1);
}
}
---
p2.c
---
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/file.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <errno.h>
#include <signal.h>
void main(int argc, char **argv) {
int fh;
void *p;
int pid;
pid = atoi(argv[1]);
sleep(1);
if ( (fh = open("shared.map", O_RDWR, S_IRWXU)) ) {
p = mmap(NULL, sysconf(_SC_PAGE_SIZE),
PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FILE,
fh, (off_t) 0);
printf("Process 2 -- Map now contains: %s\n", (char*)p);
printf("Process 2 -- Writing to map: All your bases
*NOW* belong to us.\n");
fflush(stdout);
sprintf( (char*) p, "All your bases *NOW* belong to us.");
kill(pid, SIGHUP);
sleep(3);
munmap(p, sysconf(_SC_PAGE_SIZE));
close(fh);
}
}
---
if I run both, in bash, as:
rm -f shared.map ; \
gcc -o p1 p1.c ; \
gcc -o p2 p2.c ; \
for i in ` seq 1 `getconf PAGESIZE``; do echo -e -n "\0" > shared.map ;
done ; \
./p1 & ./p2 $!
I get the following output:
$ rm -f shared.map ; \
> gcc -o p1 p1.c ; \
> gcc -o p2 p2.c ; \
> for i in ` seq 1 `getconf PAGESIZE``; do echo -e -n "\0" > shared.map
; done ; \
> ./p1 & ./p2 $!
[1] 8223
Process 1 -- Writing to map: All your bases are belong to us.
Process 2 -- Map now contains: All your bases are belong to us.
Process 2 -- Writing to map: All your bases *NOW* belong to us.
Process 1 -- Map now contains: All your bases *NOW* belong to us.
[1]+ Done ./p1
To me, the example above shows the used of a mmaped region shared
between two processes.
C code above was written too fast again! Some headers are redundant, I
do not check for all error conditions, I don't do all the cleanup - it's
kind of quick and dirty. I hope you can forgive me.
(it works on mac and linux unmodified, though quick and dirty)
I am fascinated by Daniel's suggestion of accessing the guest memory via
the QMU process existing map, found through /dev/pid/mmaps. It is my
understanding that Daniel's suggestion would require ptrace()'ing the
QEMU process, and potentially stopping the QEMU process to access the
guest mmap'ed memory.
>
>
> This thread is in reply to Valerio's attempt to upstream this patch.
> Good move.
>
> The usual questions for feature requests apply:
>
> 1. Is this a use case we want to serve?
>
> Unreserved yes. Supporting virtual machine introspection with LibVMI
> makes sense.
>
> [...]
Markus that's a great description of needs and potential approaches. It
requires a bit of thinking. I reserve the right to provide comments
later on.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-26 17:37 ` Valerio Aimale
@ 2015-10-26 17:52 ` Eduardo Habkost
2015-10-27 14:17 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Eduardo Habkost @ 2015-10-26 17:52 UTC (permalink / raw)
To: Valerio Aimale; +Cc: qemu-devel, Markus Armbruster, lcapitulino
On Mon, Oct 26, 2015 at 11:37:04AM -0600, Valerio Aimale wrote:
> On 10/26/15 3:09 AM, Markus Armbruster wrote:
> >[...]
> >
> >>Eduardo, I think it would be a common rule of politeness not to pass
> >>any judgement on a person that you don't know, but for some texts in a
> >>mailing list. I think I understand how mmap() works, and very well.
> >>
> >>Participating is this discussion has been a struggle for me. For the
> >>good of the libvmi users, I have been trying to ignore the judgements,
> >>the comments and so on. But, alas, I throw my hands up in the air, and
> >>I surrender.
> >I'm sorry we exceeded your tolerance for frustration. This mailing list
> >can be tough. We try to be welcoming (believe it or not), but we too
> >often fail (okay, that part is easily believable).
> >
> >To be honest, I had difficulties understanding your explanation, and
> >ended up guessing. I figure Eduardo did the same, and guessed
> >incorrectly. There but for the grace of God go I.
> Well, I did scribble my C sample excerpt too fast. Participating in mailing
> list is not part of my job description - I was short on time, I admit to
> that. However, there is a big difference in saying "I do no understand your
> explanation, please try again" and saying "you're confused about mmap()"
>
> I was trying to advocate the use of a shared mmap'ed region. The sharing
> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
> process. I envision that there could be a QEMU command line argument, such
> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
> libvmi client should own the file name - I have not forgotten that. When
> that command line argument is given, as part of the guest initialization,
> QEMU creates a file of size equal to the size of the guest memory containing
> all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE
> and MAP_FILE|MAP_SHARED, then starts the guest.
This is basically what memory-backend-file (and the legacy -mem-path
option) already does today, but it unlinks the file just after opening
it. We can change it to accept a full filename and/or an option to make
it not unlink the file after opening it.
I don't remember if memory-backend-file is usable without -numa, but we
could make it possible somehow.
> And, if at all possible,
> makes the filename querable via qmp and/or hmp, so that the filename of the
> mmap would not need to be maintained in two different places, leading to
> maintenance nightmares.
You can probably query the memory-backend-file QOM properties using QMP,
already. Then the filename could be exposed as a QOM property.
--
Eduardo
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-26 17:52 ` Eduardo Habkost
@ 2015-10-27 14:17 ` Valerio Aimale
2015-10-27 15:00 ` Markus Armbruster
0 siblings, 1 reply; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 14:17 UTC (permalink / raw)
To: Eduardo Habkost; +Cc: qemu-devel, Markus Armbruster, lcapitulino
On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>
>
> I was trying to advocate the use of a shared mmap'ed region. The sharing
> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
> process. I envision that there could be a QEMU command line argument, such
> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
> libvmi client should own the file name - I have not forgotten that. When
> that command line argument is given, as part of the guest initialization,
> QEMU creates a file of size equal to the size of the guest memory containing
> all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE
> and MAP_FILE|MAP_SHARED, then starts the guest.
> This is basically what memory-backend-file (and the legacy -mem-path
> option) already does today, but it unlinks the file just after opening
> it. We can change it to accept a full filename and/or an option to make
> it not unlink the file after opening it.
>
> I don't remember if memory-backend-file is usable without -numa, but we
> could make it possible somehow.
Eduardo, I did try this approach. It takes 2 line changes in exec.c:
comment the unlink out, and making sure MAP_SHARED is used when
-mem-path and -mem-prealloc are given. It works beautifully, and libvmi
accesses are fast. However, the VM is slowed down to a crawl, obviously,
because each RAM access by the VM triggers a page fault on the mmapped
file. I don't think having a crawling VM is desirable, so this approach
goes out the door.
I think we're back at estimating the speed of other approaches as
discussed previously:
- via UNIX socket as per existing patch
- via xp parsing the human readable xp output
- via an xp-like command the returns memory content baseXX-encoded into
a json string
- via shared memory as per existing code and patch
Any other?
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-27 14:17 ` Valerio Aimale
@ 2015-10-27 15:00 ` Markus Armbruster
2015-10-27 15:18 ` Valerio Aimale
0 siblings, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-27 15:00 UTC (permalink / raw)
To: Valerio Aimale; +Cc: lcapitulino, Eduardo Habkost, qemu-devel
Valerio Aimale <valerio@aimale.com> writes:
> On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>>
>>
>> I was trying to advocate the use of a shared mmap'ed region. The sharing
>> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
>> process. I envision that there could be a QEMU command line argument, such
>> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
>> libvmi client should own the file name - I have not forgotten that. When
>> that command line argument is given, as part of the guest initialization,
>> QEMU creates a file of size equal to the size of the guest memory containing
>> all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE
>> and MAP_FILE|MAP_SHARED, then starts the guest.
>> This is basically what memory-backend-file (and the legacy -mem-path
>> option) already does today, but it unlinks the file just after opening
>> it. We can change it to accept a full filename and/or an option to make
>> it not unlink the file after opening it.
>>
>> I don't remember if memory-backend-file is usable without -numa, but we
>> could make it possible somehow.
> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
> comment the unlink out, and making sure MAP_SHARED is used when
> -mem-path and -mem-prealloc are given. It works beautifully, and
> libvmi accesses are fast. However, the VM is slowed down to a crawl,
> obviously, because each RAM access by the VM triggers a page fault on
> the mmapped file. I don't think having a crawling VM is desirable, so
> this approach goes out the door.
Uh, I don't understand why "each RAM access by the VM triggers a page
fault". Can you show us the patch you used?
> I think we're back at estimating the speed of other approaches as
> discussed previously:
>
> - via UNIX socket as per existing patch
> - via xp parsing the human readable xp output
> - via an xp-like command the returns memory content baseXX-encoded
> into a json string
> - via shared memory as per existing code and patch
>
> Any other?
Yes, the existing alternative LibVMI method via gdbserver should be
included in the comparison.
Naturally, any approach that does the actual work via QMP will be dog
slow as long as LibVMI launches virsh for each QMP command. Fixable, as
Eric pointed out: use the libvirt API and link against libvirt.so
instead. No idea how much work that'll be.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-27 15:00 ` Markus Armbruster
@ 2015-10-27 15:18 ` Valerio Aimale
2015-10-27 15:31 ` Valerio Aimale
2015-10-27 16:11 ` Markus Armbruster
0 siblings, 2 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 15:18 UTC (permalink / raw)
To: Markus Armbruster; +Cc: lcapitulino, Eduardo Habkost, qemu-devel
On 10/27/15 9:00 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>>>
>>> I was trying to advocate the use of a shared mmap'ed region. The sharing
>>> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
>>> process. I envision that there could be a QEMU command line argument, such
>>> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
>>> libvmi client should own the file name - I have not forgotten that. When
>>> that command line argument is given, as part of the guest initialization,
>>> QEMU creates a file of size equal to the size of the guest memory containing
>>> all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE
>>> and MAP_FILE|MAP_SHARED, then starts the guest.
>>> This is basically what memory-backend-file (and the legacy -mem-path
>>> option) already does today, but it unlinks the file just after opening
>>> it. We can change it to accept a full filename and/or an option to make
>>> it not unlink the file after opening it.
>>>
>>> I don't remember if memory-backend-file is usable without -numa, but we
>>> could make it possible somehow.
>> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
>> comment the unlink out, and making sure MAP_SHARED is used when
>> -mem-path and -mem-prealloc are given. It works beautifully, and
>> libvmi accesses are fast. However, the VM is slowed down to a crawl,
>> obviously, because each RAM access by the VM triggers a page fault on
>> the mmapped file. I don't think having a crawling VM is desirable, so
>> this approach goes out the door.
> Uh, I don't understand why "each RAM access by the VM triggers a page
> fault". Can you show us the patch you used?
Sorry, too brief of an explanation. Every time the guest flips a byte in
physical RAM, I think that triggers a page write to the mmaped file. My
understanding is that, with MAP_SHARED, each write to RAM triggers a
file write, hence the slowness. These are the simple changes I made, to
test it - as a proof of concept.
in exec.c of the qemu-2.4.0.1 change
---
fd = mkstemp(filename);
if (fd < 0) {
error_setg_errno(errp, errno,
"unable to create backing store for hugepages");
g_free(filename);
goto error;
}
unlink(filename);
g_free(filename);
memory = (memory+hpagesize-1) & ~(hpagesize-1);
/*
* ftruncate is not supported by hugetlbfs in older
* hosts, so don't bother bailing out on errors.
* If anything goes wrong with it under other filesystems,
* mmap will fail.
*/
if (ftruncate(fd, memory)) {
perror("ftruncate");
}
area = mmap(0, memory, PROT_READ | PROT_WRITE,
(block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE),
fd, 0);
---
to
---
fd = mkstemp(filename);
if (fd < 0) {
error_setg_errno(errp, errno,
"unable to create backing store for hugepages");
g_free(filename);
goto error;
}
/* unlink(filename); */ /* Valerio's change to persist guest RAM
mmaped file */
g_free(filename);
memory = (memory+hpagesize-1) & ~(hpagesize-1);
/*
* ftruncate is not supported by hugetlbfs in older
* hosts, so don't bother bailing out on errors.
* If anything goes wrong with it under other filesystems,
* mmap will fail.
*/
if (ftruncate(fd, memory)) {
perror("ftruncate");
}
area = mmap(0, memory, PROT_READ | PROT_WRITE,
MAP_FILE | MAP_SHARED, /* Valerio's change to persist
guest RAM mmaped file */
fd, 0);
---
then, recompile qemu.
Launch a VM as
/usr/local/bin/qemu-system-x86_64 -name Windows10 -S -machine
pc-i440fx-2.4,accel=kvm,usb=off [...] -mem-prealloc -mem-path /tmp/maps
# I know -mem-path is deprecated, but I used for speeding up the proof
of concept.
With the above command, I have a the following file
$ ls -l /tmp/maps/
-rw------- 1 libvirt-qemu kvm 2147483648 Oct 27 08:31
qemu_back_mem.pc.ram.fP4sKH
which is a mmap of the Win VM physical RAM
$ hexdump -C /tmp/maps/qemu_back_mem.val.pc.ram.fP4sKH
00000000 53 ff 00 f0 53 ff 00 f0 c3 e2 00 f0 53 ff 00 f0
|S...S.......S...|
[...]
00000760 24 02 c3 49 6e 76 61 6c 69 64 20 70 61 72 74 69 |$..Invalid
parti|
00000770 74 69 6f 6e 20 74 61 62 6c 65 00 45 72 72 6f 72 |tion
table.Error|
00000780 20 6c 6f 61 64 69 6e 67 20 6f 70 65 72 61 74 69 | loading
operati|
00000790 6e 67 20 73 79 73 74 65 6d 00 4d 69 73 73 69 6e |ng
system.Missin|
000007a0 67 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 |g operating
syst|
000007b0 65 6d 00 00 00 63 7b 9a 73 d8 99 ce 00 00 80 20
|em...c{.s...... |
[...]
I did not try to mmap'ing to a file on a RAMdisk. Without physical disk
I/O, the VM might run faster.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-27 15:18 ` Valerio Aimale
@ 2015-10-27 15:31 ` Valerio Aimale
2015-10-27 16:11 ` Markus Armbruster
1 sibling, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 15:31 UTC (permalink / raw)
To: Markus Armbruster; +Cc: lcapitulino, Eduardo Habkost, qemu-devel
On 10/27/15 9:18 AM, Valerio Aimale wrote:
>
>
> I did not try to mmap'ing to a file on a RAMdisk. Without physical
> disk I/O, the VM might run faster.
>
>
I did try with the file on a ramdisk
$ sudo mount -o size=3G -t tmpfs none /ramdisk
$ /usr/local/bin/qemu-system-x86_64 -name Windows10 -S -machine
pc-i440fx-2.4,accel=kvm,usb=off [...] -mem-prealloc -mem-path /ramdisk
with that, the speed of the VM is acceptable. It will not take your
breath away, but it is reasonable.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-27 15:18 ` Valerio Aimale
2015-10-27 15:31 ` Valerio Aimale
@ 2015-10-27 16:11 ` Markus Armbruster
2015-10-27 16:27 ` Valerio Aimale
1 sibling, 1 reply; 43+ messages in thread
From: Markus Armbruster @ 2015-10-27 16:11 UTC (permalink / raw)
To: Valerio Aimale; +Cc: qemu-devel, Eduardo Habkost, lcapitulino
Valerio Aimale <valerio@aimale.com> writes:
> On 10/27/15 9:00 AM, Markus Armbruster wrote:
>> Valerio Aimale <valerio@aimale.com> writes:
>>
>>> On 10/26/15 11:52 AM, Eduardo Habkost wrote:
>>>>
>>>> I was trying to advocate the use of a shared mmap'ed region. The sharing
>>>> would be two-ways (RW for both) between the QEMU virtualizer and the libvmi
>>>> process. I envision that there could be a QEMU command line argument, such
>>>> as "--mmap-guest-memory <filename>" Understand that Eric feels strongly the
>>>> libvmi client should own the file name - I have not forgotten that. When
>>>> that command line argument is given, as part of the guest initialization,
>>>> QEMU creates a file of size equal to the size of the guest memory containing
>>>> all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE
>>>> and MAP_FILE|MAP_SHARED, then starts the guest.
>>>> This is basically what memory-backend-file (and the legacy -mem-path
>>>> option) already does today, but it unlinks the file just after opening
>>>> it. We can change it to accept a full filename and/or an option to make
>>>> it not unlink the file after opening it.
>>>>
>>>> I don't remember if memory-backend-file is usable without -numa, but we
>>>> could make it possible somehow.
>>> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
>>> comment the unlink out, and making sure MAP_SHARED is used when
>>> -mem-path and -mem-prealloc are given. It works beautifully, and
>>> libvmi accesses are fast. However, the VM is slowed down to a crawl,
>>> obviously, because each RAM access by the VM triggers a page fault on
>>> the mmapped file. I don't think having a crawling VM is desirable, so
>>> this approach goes out the door.
>> Uh, I don't understand why "each RAM access by the VM triggers a page
>> fault". Can you show us the patch you used?
> Sorry, too brief of an explanation. Every time the guest flips a byte in
> physical RAM, I think that triggers a page write to the mmaped file. My
> understanding is that, with MAP_SHARED, each write to RAM triggers a
> file write, hence the slowness. These are the simple changes I made, to
> test it - as a proof of concept.
Ah, that actually makes sense. Thanks!
[...]
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-27 16:11 ` Markus Armbruster
@ 2015-10-27 16:27 ` Valerio Aimale
0 siblings, 0 replies; 43+ messages in thread
From: Valerio Aimale @ 2015-10-27 16:27 UTC (permalink / raw)
To: Markus Armbruster; +Cc: qemu-devel, Eduardo Habkost, lcapitulino
On 10/27/15 10:11 AM, Markus Armbruster wrote:
> [...]
>>>> Eduardo, I did try this approach. It takes 2 line changes in exec.c:
>>>> comment the unlink out, and making sure MAP_SHARED is used when
>>>> -mem-path and -mem-prealloc are given. It works beautifully, and
>>>> libvmi accesses are fast. However, the VM is slowed down to a crawl,
>>>> obviously, because each RAM access by the VM triggers a page fault on
>>>> the mmapped file. I don't think having a crawling VM is desirable, so
>>>> this approach goes out the door.
>>> Uh, I don't understand why "each RAM access by the VM triggers a page
>>> fault". Can you show us the patch you used?
>> Sorry, too brief of an explanation. Every time the guest flips a byte in
>> physical RAM, I think that triggers a page write to the mmaped file. My
>> understanding is that, with MAP_SHARED, each write to RAM triggers a
>> file write, hence the slowness. These are the simple changes I made, to
>> test it - as a proof of concept.
> Ah, that actually makes sense. Thanks!
>
> [...]
However, when the guest RAM mmap'ed file resides on a RAMdisk on the
host, the guest OS responsiveness is more than acceptable. Perhaps this
is a viable approach. It might requires a minimum of changes to the QEMU
source and maybe 1 extra command line argument.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
2015-10-23 19:24 ` Eduardo Habkost
2015-10-23 20:02 ` Richard Henderson
@ 2015-11-02 12:55 ` Paolo Bonzini
1 sibling, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2015-11-02 12:55 UTC (permalink / raw)
To: Eduardo Habkost, Markus Armbruster
Cc: Richard Henderson, qemu-devel, Valerio Aimale, lcapitulino
On 23/10/2015 21:24, Eduardo Habkost wrote:
>> >
>> > Valerio, would an command line option to share guest memory suffice, or
>> > does it have to be a monitor command? If the latter, why?
>> >
>> > Eduardo, I'm not sure writing to guest memory behind TCG's back will
>> > work. Do you know?
> I don't know. I guess it may possibly surprise TCG depending on how some
> operations are implemented, but it sounds unlikely. CCing Richard.
>
>> >
>> > Will it work with KVM?
> I always assumed it would work, but CCing Paolo to see if there are any
> pitfalls.
Yes, both should work.
Paolo
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2015-11-02 12:55 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
2015-10-19 21:33 ` Eric Blake
2015-10-21 15:11 ` Valerio Aimale
2015-10-16 8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
2015-10-16 14:30 ` Valerio Aimale
2015-10-19 7:52 ` Markus Armbruster
2015-10-19 14:37 ` Valerio Aimale
2015-10-21 10:54 ` Markus Armbruster
2015-10-21 15:50 ` Valerio Aimale
2015-10-22 11:50 ` Markus Armbruster
2015-10-22 18:11 ` Valerio Aimale
2015-10-23 6:31 ` Markus Armbruster
2015-10-22 18:43 ` Valerio Aimale
2015-10-22 18:54 ` Eric Blake
2015-10-22 19:12 ` Eduardo Habkost
2015-10-22 19:57 ` Valerio Aimale
2015-10-22 20:03 ` Eric Blake
2015-10-22 20:45 ` Valerio Aimale
2015-10-22 21:47 ` Eduardo Habkost
2015-10-22 21:51 ` Valerio Aimale
2015-10-23 8:25 ` Daniel P. Berrange
2015-10-23 19:00 ` Eduardo Habkost
2015-10-23 18:55 ` Eduardo Habkost
2015-10-23 19:08 ` Valerio Aimale
2015-10-26 9:09 ` Markus Armbruster
2015-10-26 17:37 ` Valerio Aimale
2015-10-26 17:52 ` Eduardo Habkost
2015-10-27 14:17 ` Valerio Aimale
2015-10-27 15:00 ` Markus Armbruster
2015-10-27 15:18 ` Valerio Aimale
2015-10-27 15:31 ` Valerio Aimale
2015-10-27 16:11 ` Markus Armbruster
2015-10-27 16:27 ` Valerio Aimale
2015-10-23 6:35 ` Markus Armbruster
2015-10-23 8:18 ` Daniel P. Berrange
2015-10-23 14:48 ` Valerio Aimale
2015-10-23 14:44 ` Valerio Aimale
2015-10-23 14:56 ` Eric Blake
2015-10-23 15:03 ` Valerio Aimale
2015-10-23 19:24 ` Eduardo Habkost
2015-10-23 20:02 ` Richard Henderson
2015-11-02 12:55 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).