From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50267)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <valerio@aimale.com>) id 1ZpKLc-0004w9-GL
	for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:12:26 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <valerio@aimale.com>) id 1ZpKLZ-00084G-7o
	for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:12:20 -0400
Received: from smtp.aimale.com ([166.78.138.199]:50450)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <valerio@aimale.com>) id 1ZpKLZ-0007vU-2E
	for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:12:17 -0400
References: <1444952643-5033-1-git-send-email-valerio@aimale.com>
	<87h9lrkz56.fsf@blackfin.pond.sub.org> <56210A17.6080401@aimale.com>
	<87io63xpke.fsf@blackfin.pond.sub.org> <56250035.40805@aimale.com>
	<87twpkqyow.fsf@blackfin.pond.sub.org> <5627B458.1@aimale.com>
	<87lhav6s28.fsf@blackfin.pond.sub.org>
From: Valerio Aimale <valerio@aimale.com>
Message-ID: <562926E1.1060201@aimale.com>
Date: Thu, 22 Oct 2015 12:11:45 -0600
MIME-Version: 1.0
In-Reply-To: <87lhav6s28.fsf@blackfin.pond.sub.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org, ehabkost@redhat.com, lcapitulino@redhat.com

On 10/22/15 5:50 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/21/15 4:54 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>>>> Valerio Aimale <valerio@aimale.com> writes:
>>>>>
>>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>>>> valerio@aimale.com writes:
>>>>>>>
>>>>>>>> All-
>>>>>>>>
>>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>>>> introspect QEMU/KVM VMs.
>>>>>>>>
>>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>>>
>>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>>>
>>>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>>>> structure:
>>>>>>>>
>>>>>>>> struct request {
>>>>>>>>          uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>>>          uint64_t address;   // address to read from OR write to
>>>>>>>>          uint64_t length;    // number of bytes to read OR write
>>>>>>>> };
>>>>>>>>
>>>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>>>
>>>>>>>> The last bytes of a read operation response indicates success (1
>>>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>>>> indicates same (1 success, 0 failure).
>>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>>>> garbage followed by the "it failed" byte?
>>>>>> Markus, that appear to be the case. However, I did not write the
>>>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>>>> person that wrote the protocol, did not want to bother with over
>>>>>> complicating things.
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> I'm thinking he assumed reads would be small in size and the price of
>>>>>> reading garbage was less than the price of writing a more complicated
>>>>>> protocol. I can see his point, confronted with the same problem, I
>>>>>> might have done the same.
>>>>> All right, the interface is designed for *small* memory blocks then.
>>>>>
>>>>> Makes me wonder why he needs a separate binary protocol on a separate
>>>>> socket.  Small blocks could be done just fine in QMP.
>>>> The problem is speed. if one's analyzing the memory space of a running
>>>> process (physical and paged), libvmi will make a large number of small
>>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>>>> quite significant. xp has overhead due to encoding, and pmemsave has
>>>> overhead due to file open/write (server), file open/read/close/unlink
>>>> (client).
>>>>
>>>> Others have gone through the problem before me. It appears that
>>>> pmemsave and xp are significantly slower than reading memory using a
>>>> socket via pmemaccess.
>>> That they're slower isn't surprising, but I'd expect the cost of
>>> encoding a small block to be insiginificant compared to the cost of the
>>> network roundtrips.
>>>
>>> As block size increases, the space overhead of encoding will eventually
>>> bite.  But for that usage, the binary protocol appears ill-suited,
>>> unless the client can pretty reliably avoid read failure.  I haven't
>>> examined its failure modes, yet.
>>>
>>>> The following data is not mine, but it shows the time, in
>>>> milliseconds, required to resolve the content of a paged memory
>>>> address via socket (pmemaccess) , pmemsave and xp
>>>>
>>>> http://cl.ly/image/322a3s0h1V05
>>>>
>>>> Again, I did not produce those data points, they come from an old
>>>> libvmi thread.
>>> 90ms is a very long time.  What exactly was measured?
>> That is a fair question to ask. Unfortunately, I extracted  that data
>> plot from an old thread in some libvmi mailing list. I do not have the
>> data and code that produced it. Sifting through the thread, I can see
>> the code
>> was never published. I will take it upon myself to produce code that
>> compares timing - in a fair fashion - of libvmi doing an atomic
>> operation and a larger-scale operation (like listing running
>> processes)  via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a
>> version of xp that returns byte streams of memory regions base64 or
>> base85 encoded in json strings. I'll publish results and code.
>>
>> However, given workload and life happening, it will be some time
>> before I complete that task.
> No problem.  I'd like to have your use case addressed, but there's no
> need for haste.

Thanks, Markus. Appreciate your help.
>
> [...]
>>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>>>> 64bit VM's
>>>>>>>>
>>>>>>>> in qapi-schema.json
>>>>>>>>
>>>>>>>> from
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>>       'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> ---
>>>>>>>> { 'command': 'pmemsave',
>>>>>>>>       'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>>>> ---
>>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>>>> confusing.
>>>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>>>> over telnet this is what happens:
>>>>>>
>>>>>> $ telnet localhost 1234
>>>>>> Trying 127.0.0.1...
>>>>>> Connected to localhost.
>>>>>> Escape character is '^]'.
>>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>>>> (qemu) help pmemsave
>>>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>>>> at 'addr' of size 'size'
>>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>>>> Try "help pmemsave" for more information
>>>>>> (qemu) quit
>>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>>>> no-op.
>>>>>
>>>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>>>> command is *independent* of the QMP command.  The implementation *uses*
>>>>> the QMP command.
>>>>>
>>>>> QMP pmemsave is defined in qapi-schema.json as
>>>>>
>>>>>        { 'command': 'pmemsave',
>>>>>          'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>
>>>>> Its implementation is in cpus.c:
>>>>>
>>>>>        void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>>>                          Error **errp)
>>>>>
>>>>> Note the int64_t size.
>>>>>
>>>>> HMP pmemsave is defined in hmp-commands.hx as
>>>>>
>>>>>        {
>>>>>            .name       = "pmemsave",
>>>>>            .args_type  = "val:l,size:i,filename:s",
>>>>>            .params     = "addr size file",
>>>>>            .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>>>            .mhandler.cmd = hmp_pmemsave,
>>>>>        },
>>>>>
>>>>> Its implementation is in hmp.c:
>>>>>
>>>>>        void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>>>        {
>>>>>            uint32_t size = qdict_get_int(qdict, "size");
>>>>>            const char *filename = qdict_get_str(qdict, "filename");
>>>>>            uint64_t addr = qdict_get_int(qdict, "val");
>>>>>            Error *err = NULL;
>>>>>
>>>>>            qmp_pmemsave(addr, size, filename, &err);
>>>>>            hmp_handle_error(mon, &err);
>>>>>        }
>>>>>
>>>>> Note uint32_t size.
>>>>>
>>>>> Arguably, the QMP size argument should use 'size' (an alias for
>>>>> 'uint64'), and the HMP args_type should use 'size:o'.
>>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>>>> way pmemsave is implemented. There is a single function, and two
>>>> points of entrance, one for HMP and one for QMP. I think pmemacess
>>>> mimics pmemsave closely.
>>>>
>>>> However, if one wants to simply dump a memory region, via HMP for
>>>> human easy of use/debug/testing purposes, one cannot dump memory
>>>> regions that resides higher than 2^32-1
>>> Can you give an example?
>> Yes. I was trying to dump the full extent of physical memory of a VM
>> that has 8GB memory space (ballooned). I simply did this:
>>
>> $ telnet localhost 1234
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> QEMU 2.4.0.1 monitor - type 'help' for more information
>> (qemu) pmemsave 0 8589934591 "/tmp/memsaved"
>> 'pmemsave' has failed: integer is for 32-bit values
>>
>> Maybe I misunderstood how pmemsave works. Maybe I should have used
>> dump-guest-memory
> This is am unnecessary limitation caused by 'size:i' instead of
> 'size:o'.  Fixable.
I think I tried changing size:i to size:l, but, I was still receiving 
the error.