Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Valerio Aimale <valerio@aimale.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org, ehabkost@redhat.com, lcapitulino@redhat.com
Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
Date: Thu, 22 Oct 2015 12:11:45 -0600	[thread overview]
Message-ID: <562926E1.1060201@aimale.com> (raw)
In-Reply-To: <87lhav6s28.fsf@blackfin.pond.sub.org>

On 10/22/15 5:50 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>>
>>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>>> valerio@aimale.com writes:
>>>>>>>
>>>>>>>> All-
>>>>>>>>
>>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>>
>>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>>
>>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>>
>>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>>> structure:
>>>>>>>>
>>>>>>>> struct request {
>>>>>>>>          uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>>>          uint64_t address;   // address to read from OR write to
>>>>>>>>          uint64_t length;    // number of bytes to read OR write
>>>>>>>> };
>>>>>>>>
>>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>>
>>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>>> indicates same (1 success, 0 failure).
>>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>>> garbage followed by the "it failed" byte?
>>>>>> Markus, that appear to be the case. However, I did not write the
>>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>>> person that wrote the protocol, did not want to bother with over
>>>>>> complicating things.
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>>> reading garbage was less than the price of writing a more complicated
>>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>>> might have done the same.
>>>>> All right, the interface is designed for *small* memory blocks then.
>>>>>
>>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>>> socket.  Small blocks could be done just fine in QMP.
>>>> The problem is speed. if one's analyzing the memory space of a running
>>>> process (physical and paged), libvmi will make a large number of small
>>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>>> overhead due to file open/write (server), file open/read/close/unlink
>>>> (client).
>>>>
>>>> Others have gone through the problem before me. It appears that
>>>> pmemsave and xp are significantly slower than reading memory using a
>>>> socket via pmemaccess.
>>> That they're slower isn't surprising, but I'd expect the cost of
>>> encoding a small block to be insiginificant compared to the cost of the
>>> network roundtrips.
>>>
>>> As block size increases, the space overhead of encoding will eventually
>>> bite.  But for that usage, the binary protocol appears ill-suited,
>>> unless the client can pretty reliably avoid read failure.  I haven't
>>> examined its failure modes, yet.
>>>
>>>> The following data is not mine, but it shows the time, in
>>>> milliseconds, required to resolve the content of a paged memory
>>>> address via socket (pmemaccess) , pmemsave and xp
>>>>
>>>> http://cl.ly/image/322a3s0h1V05
>>>>
>>>> Again, I did not produce those data points, they come from an old
>>>> libvmi thread.
>>> 90ms is a very long time.  What exactly was measured?
>> That is a fair question to ask. Unfortunately, I extracted  that data
>> plot from an old thread in some libvmi mailing list. I do not have the
>> data and code that produced it. Sifting through the thread, I can see
>> the code
>> was never published. I will take it upon myself to produce code that
>> compares timing - in a fair fashion - of libvmi doing an atomic
>> operation and a larger-scale operation (like listing running
>> processes)  via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
>> version of xp that returns byte streams of memory regions base64 or
>> base85 encoded in json strings. I'll publish results and code.
>>
>> However, given workload and life happening, it will be some time
>> before I complete that task.
> No problem.  I'd like to have your use case addressed, but there's no
> need for haste.

Thanks, Markus. Appreciate your help.
>
> [...]
>>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>>> 64bit VM's
>>>>>>>>
>>>>>>>> in qapi-schema.json
>>>>>>>>
>>>>>>>> from
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>>       'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>>       'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>>>> confusing.
>>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>>> over telnet this is what happens:
>>>>>>
>>>>>> $ telnet localhost 1234
>>>>>> Trying 127.0.0.1...
>>>>>> Connected to localhost.
>>>>>> Escape character is '^]'.
>>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>>> (qemu) help pmemsave
>>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>>> at 'addr' of size 'size'
>>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>>> Try "help pmemsave" for more information
>>>>>> (qemu) quit
>>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>>> no-op.
>>>>>
>>>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>>>> command is *independent* of the QMP command.  The implementation *uses*
>>>>> the QMP command.
>>>>>
>>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>>
>>>>>        { 'command': 'pmemsave',
>>>>>          'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>
>>>>> Its implementation is in cpus.c:
>>>>>
>>>>>        void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>>>                          Error **errp)
>>>>>
>>>>> Note the int64_t size.
>>>>>
>>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>>
>>>>>        {
>>>>>            .name       = "pmemsave",
>>>>>            .args_type  = "val:l,size:i,filename:s",
>>>>>            .params     = "addr size file",
>>>>>            .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>>>            .mhandler.cmd = hmp_pmemsave,
>>>>>        },
>>>>>
>>>>> Its implementation is in hmp.c:
>>>>>
>>>>>        void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>>>        {
>>>>>            uint32_t size = qdict_get_int(qdict, "size");
>>>>>            const char *filename = qdict_get_str(qdict, "filename");
>>>>>            uint64_t addr = qdict_get_int(qdict, "val");
>>>>>            Error *err = NULL;
>>>>>
>>>>>            qmp_pmemsave(addr, size, filename, &err);
>>>>>            hmp_handle_error(mon, &err);
>>>>>        }
>>>>>
>>>>> Note uint32_t size.
>>>>>
>>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>>> way pmemsave is implemented. There is a single function, and two
>>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>>> mimics pmemsave closely.
>>>>
>>>> However, if one wants to simply dump a memory region, via HMP for
>>>> human easy of use/debug/testing purposes, one cannot dump memory
>>>> regions that resides higher than 2^32-1
>>> Can you give an example?
>> Yes. I was trying to dump the full extent of physical memory of a VM
>> that has 8GB memory space (ballooned). I simply did this:
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
>> 'pmemsave' has failed: integer is for 32-bit values
>>
>> Maybe I misunderstood how pmemsave works. Maybe I should have used
>> dump-guest-memory
> This is am unnecessary limitation caused by 'size:i' instead of
> 'size:o'.  Fixable.
I think I tried changing size:i to size:l, but, I was still receiving 
the error.

next prev parent reply	other threads:[~2015-10-22 18:12 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-15 23:44 [Qemu-devel] QEMU patch to allow VM introspection via libvmi valerio
2015-10-15 23:44 ` [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source permanently valerio
2015-10-19 21:33   ` Eric Blake
2015-10-21 15:11     ` Valerio Aimale
2015-10-16  8:15 ` [Qemu-devel] QEMU patch to allow VM introspection via libvmi Markus Armbruster
2015-10-16 14:30   ` Valerio Aimale
2015-10-19  7:52     ` Markus Armbruster
2015-10-19 14:37       ` Valerio Aimale
2015-10-21 10:54         ` Markus Armbruster
2015-10-21 15:50           ` Valerio Aimale
2015-10-22 11:50             ` Markus Armbruster
2015-10-22 18:11               ` Valerio Aimale [this message]
2015-10-23  6:31                 ` Markus Armbruster
2015-10-22 18:43           ` Valerio Aimale
2015-10-22 18:54             ` Eric Blake
2015-10-22 19:12           ` Eduardo Habkost
2015-10-22 19:57             ` Valerio Aimale
2015-10-22 20:03               ` Eric Blake
2015-10-22 20:45                 ` Valerio Aimale
2015-10-22 21:47               ` Eduardo Habkost
2015-10-22 21:51                 ` Valerio Aimale
2015-10-23  8:25                   ` Daniel P. Berrange
2015-10-23 19:00                     ` Eduardo Habkost
2015-10-23 18:55                   ` Eduardo Habkost
2015-10-23 19:08                     ` Valerio Aimale
2015-10-26  9:09                       ` Markus Armbruster
2015-10-26 17:37                         ` Valerio Aimale
2015-10-26 17:52                           ` Eduardo Habkost
2015-10-27 14:17                             ` Valerio Aimale
2015-10-27 15:00                               ` Markus Armbruster
2015-10-27 15:18                                 ` Valerio Aimale
2015-10-27 15:31                                   ` Valerio Aimale
2015-10-27 16:11                                   ` Markus Armbruster
2015-10-27 16:27                                     ` Valerio Aimale
2015-10-23  6:35             ` Markus Armbruster
2015-10-23  8:18               ` Daniel P. Berrange
2015-10-23 14:48                 ` Valerio Aimale
2015-10-23 14:44               ` Valerio Aimale
2015-10-23 14:56                 ` Eric Blake
2015-10-23 15:03                   ` Valerio Aimale
2015-10-23 19:24               ` Eduardo Habkost
2015-10-23 20:02                 ` Richard Henderson
2015-11-02 12:55                 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562926E1.1060201@aimale.com \
    --to=valerio@aimale.com \
    --cc=armbru@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.