From: Valerio Aimale <valerio@aimale.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: lcapitulino@redhat.com, qemu-devel@nongnu.org, ehabkost@redhat.com
Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
Date: Thu, 22 Oct 2015 12:43:06 -0600 [thread overview]
Message-ID: <56292E3A.2010603@aimale.com> (raw)
In-Reply-To: <87twpkqyow.fsf@blackfin.pond.sub.org>
On 10/21/15 4:54 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>> valerio@aimale.com writes:
>>>>>
>>>>>> All-
>>>>>>
>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>> introspect QEMU/KVM VMs.
>>>>>>
>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>
>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>
>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>> structure:
>>>>>>
>>>>>> struct request {
>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>> uint64_t address; // address to read from OR write to
>>>>>> uint64_t length; // number of bytes to read OR write
>>>>>> };
>>>>>>
>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>
>>>>>> The last bytes of a read operation response indicates success (1
>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>> indicates same (1 success, 0 failure).
>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>> garbage followed by the "it failed" byte?
>>>> Markus, that appear to be the case. However, I did not write the
>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>> person that wrote the protocol, did not want to bother with over
>>>> complicating things.
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> I'm thinking he assumed reads would be small in size and the price of
>>>> reading garbage was less than the price of writing a more complicated
>>>> protocol. I can see his point, confronted with the same problem, I
>>>> might have done the same.
>>> All right, the interface is designed for *small* memory blocks then.
>>>
>>> Makes me wonder why he needs a separate binary protocol on a separate
>>> socket. Small blocks could be done just fine in QMP.
>> The problem is speed. if one's analyzing the memory space of a running
>> process (physical and paged), libvmi will make a large number of small
>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>> quite significant. xp has overhead due to encoding, and pmemsave has
>> overhead due to file open/write (server), file open/read/close/unlink
>> (client).
>>
>> Others have gone through the problem before me. It appears that
>> pmemsave and xp are significantly slower than reading memory using a
>> socket via pmemaccess.
> That they're slower isn't surprising, but I'd expect the cost of
> encoding a small block to be insiginificant compared to the cost of the
> network roundtrips.
>
> As block size increases, the space overhead of encoding will eventually
> bite. But for that usage, the binary protocol appears ill-suited,
> unless the client can pretty reliably avoid read failure. I haven't
> examined its failure modes, yet.
>
>> The following data is not mine, but it shows the time, in
>> milliseconds, required to resolve the content of a paged memory
>> address via socket (pmemaccess) , pmemsave and xp
>>
>> http://cl.ly/image/322a3s0h1V05
>>
>> Again, I did not produce those data points, they come from an old
>> libvmi thread.
> 90ms is a very long time. What exactly was measured?
>
>> I think it might be conceivable that there could be a QMP command that
>> returns the content of an arbitrarily size memory region as a base64
>> or a base85 json string. It would still have both time- (due to
>> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
>> overhead, + json encoding/decoding overhead. It might still be the
>> case that socket would outperform such a command as well,
>> speed-vise. I don't think it would be any faster than xp.
> A special-purpose binary protocol over a dedicated socket will always do
> less than a QMP solution (ignoring foolishness like transmitting crap on
> read error the client is then expected to throw away). The question is
> whether the difference in work translates to a worthwhile difference in
> performance.
>
> The larger question is actually whether we have an existing interface
> that can serve the libvmi's needs. We've discussed monitor commands
> like xp, pmemsave, pmemread. There's another existing interface: the
> GDB stub. Have you considered it?
>
>> There's also a similar patch, floating around the internet, the uses
>> shared memory, instead of sockets, as inter-process communication
>> between libvmi and QEMU. I've never used that.
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process? Eduardo, can -mem-path do such wild things?
Markus, your suggestion led to a lightbulb going off in my head.
What if there was a qmp command, say 'pmemmap' then when invoked,
performs the following:
qmp_pmemmap( [...]) {
char *template = "/tmp/QEM_mmap_XXXXXXX";
int mmap_fd;
uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming
VM with 8GB RAM */);
cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr)
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will
discuss write later */);
mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");
mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ |
PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0);
/* etc */
}
pmemmap would return the following json
{
'success' : 'true',
'map_filename' : '/tmp/QEM_mmap_1234567'
}
the qmp client/caller, would then allocate a region of memory equal to
the size of the file; mmap() the file '/tmp/QEM_mmap_1234567' into the
region. It would then have (read, maybe write?) access to the full
extent of the guest memory without making any other qmp call. I think it
would be fast, and with low memory usage, as mmap() is pretty efficient.
Of course, there would be a 'pmemunmap' qmp commands that would perform
the cleanup
/* etc. */
munmap()
cpu_physical_memory_unmap();
/* etc. */
Would that work? Is mapping the full extent of the guest RAM too much to
ask of cpu_physical_memory_rw()?
>
>>>>>> The socket API was written by the libvmi author and it works the with
>>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>>> point release.
>>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>>
>>>>> Can you explain why these won't do for your use case?
>>>> For people who do security analysis there are two use cases, static
>>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>>> can do static analysis. I.e. snapshotting a VM and see what was
>>>> happening at that point in time.
>>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>>
>>>> If you take a snapshot of two people exchanging a glass of water, and
>>>> you happen to take it at the very moment both persons have their hands
>>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>>> have a movie of the same scene, it's obvious who's the giver and who's
>>>> the receiver. Same use case.
>>> I understand the need for introspecting a running guest. What exactly
>>> makes the existing commands unsuitable for that?
>> Speed. See discussion above.
>>>> More to the point, there's a host of C and python frameworks to
>>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>>> build on top of libvmi. I did not want to reinvent the wheel.
>>> Fair enough.
>>>
>>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>>> memory files." What exactly is missing for KVM?
>> When they say they support kvm, what they really mean they support the
>> (retired, I understand) qemu-kvm fork via a patch that is provided in
>> the libvmi source tree. I think the most recent qem-kvm supported is
>> 1.6.0
>>
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> I wanted to bring support to the head revision of QEMU, to bring
>> libvmi level with more modern QEMU.
>>
>> Maybe the solution is simply to put this patch in the libvmi source
>> tree, which I've already asked to do via pull request, leaving QEMU
>> alone.
>> However, the patch has to be updated at every QEMU point release. I
>> wanted to avoid that, if at all possible.
>>
>>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>>> contend that xen has better introspection support. In my case, I did
>>>> not want to bother with dedicating a full server to be a xen domain
>>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>>> an otherwise purposed server.
>>> I'm not at all against better introspection support in QEMU. I'm just
>>> trying to understand the problem you're trying to solve with your
>>> patches.
>> What all users of libvmi would love to have is super high speed access
>> to VM physical memory as part of the QEMU source tree, and not
>> supported via a patch. Implemented as the QEMU owners see fit, as long
>> as it is blazing fast and easy accessed via client library or
>> inter-process communication.
> The use case makes sense to me, we just need to figure out how we want
> to serve it in QEMU.
>
>> My gut feeling is that it has to bypass QMP protocol/encoding/file
>> access/json to be fast, but, it is just a gut feeling - worth nothing.
> My gut feeling is that QMP should do fine in overhead compared to other
> solutions involving socket I/O as long as the data sizes are *small*.
> Latency might be an issue, though: QMP commands are processed from the
> main loop. A dedicated server thread can be more responsive, but
> letting it write to shared resources could be "interesting".
>
>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>> 64bit VM's
>>>>>>
>>>>>> in qapi-schema.json
>>>>>>
>>>>>> from
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>> ---
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>> ---
>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's
>>>>> confusing.
>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>> over telnet this is what happens:
>>>>
>>>> $ telnet localhost 1234
>>>> Trying 127.0.0.1...
>>>> Connected to localhost.
>>>> Escape character is '^]'.
>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>> (qemu) help pmemsave
>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>> at 'addr' of size 'size'
>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>> Try "help pmemsave" for more information
>>>> (qemu) quit
>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>> no-op.
>>>
>>> Your example shows *HMP* command pmemsave. The definition of an HMP
>>> command is *independent* of the QMP command. The implementation *uses*
>>> the QMP command.
>>>
>>> QMP pmemsave is defined in qapi-schema.json as
>>>
>>> { 'command': 'pmemsave',
>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>
>>> Its implementation is in cpus.c:
>>>
>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>> Error **errp)
>>>
>>> Note the int64_t size.
>>>
>>> HMP pmemsave is defined in hmp-commands.hx as
>>>
>>> {
>>> .name = "pmemsave",
>>> .args_type = "val:l,size:i,filename:s",
>>> .params = "addr size file",
>>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>> .mhandler.cmd = hmp_pmemsave,
>>> },
>>>
>>> Its implementation is in hmp.c:
>>>
>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>> {
>>> uint32_t size = qdict_get_int(qdict, "size");
>>> const char *filename = qdict_get_str(qdict, "filename");
>>> uint64_t addr = qdict_get_int(qdict, "val");
>>> Error *err = NULL;
>>>
>>> qmp_pmemsave(addr, size, filename, &err);
>>> hmp_handle_error(mon, &err);
>>> }
>>>
>>> Note uint32_t size.
>>>
>>> Arguably, the QMP size argument should use 'size' (an alias for
>>> 'uint64'), and the HMP args_type should use 'size:o'.
>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>> way pmemsave is implemented. There is a single function, and two
>> points of entrance, one for HMP and one for QMP. I think pmemacess
>> mimics pmemsave closely.
>>
>> However, if one wants to simply dump a memory region, via HMP for
>> human easy of use/debug/testing purposes, one cannot dump memory
>> regions that resides higher than 2^32-1
> Can you give an example?
>
> [...]
next prev parent reply other threads:[~2015-10-22 18:43 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
2015-10-19 21:33 ` Eric Blake
2015-10-21 15:11 ` Valerio Aimale
2015-10-16 8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
2015-10-16 14:30 ` Valerio Aimale
2015-10-19 7:52 ` Markus Armbruster
2015-10-19 14:37 ` Valerio Aimale
2015-10-21 10:54 ` Markus Armbruster
2015-10-21 15:50 ` Valerio Aimale
2015-10-22 11:50 ` Markus Armbruster
2015-10-22 18:11 ` Valerio Aimale
2015-10-23 6:31 ` Markus Armbruster
2015-10-22 18:43 ` Valerio Aimale [this message]
2015-10-22 18:54 ` Eric Blake
2015-10-22 19:12 ` Eduardo Habkost
2015-10-22 19:57 ` Valerio Aimale
2015-10-22 20:03 ` Eric Blake
2015-10-22 20:45 ` Valerio Aimale
2015-10-22 21:47 ` Eduardo Habkost
2015-10-22 21:51 ` Valerio Aimale
2015-10-23 8:25 ` Daniel P. Berrange
2015-10-23 19:00 ` Eduardo Habkost
2015-10-23 18:55 ` Eduardo Habkost
2015-10-23 19:08 ` Valerio Aimale
2015-10-26 9:09 ` Markus Armbruster
2015-10-26 17:37 ` Valerio Aimale
2015-10-26 17:52 ` Eduardo Habkost
2015-10-27 14:17 ` Valerio Aimale
2015-10-27 15:00 ` Markus Armbruster
2015-10-27 15:18 ` Valerio Aimale
2015-10-27 15:31 ` Valerio Aimale
2015-10-27 16:11 ` Markus Armbruster
2015-10-27 16:27 ` Valerio Aimale
2015-10-23 6:35 ` Markus Armbruster
2015-10-23 8:18 ` Daniel P. Berrange
2015-10-23 14:48 ` Valerio Aimale
2015-10-23 14:44 ` Valerio Aimale
2015-10-23 14:56 ` Eric Blake
2015-10-23 15:03 ` Valerio Aimale
2015-10-23 19:24 ` Eduardo Habkost
2015-10-23 20:02 ` Richard Henderson
2015-11-02 12:55 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56292E3A.2010603@aimale.com \
--to=valerio@aimale.com \
--cc=armbru@redhat.com \
--cc=ehabkost@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).