From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50267) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpKLc-0004w9-GL for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:12:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZpKLZ-00084G-7o for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:12:20 -0400 Received: from smtp.aimale.com ([166.78.138.199]:50450) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpKLZ-0007vU-2E for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:12:17 -0400 References: <1444952643-5033-1-git-send-email-valerio@aimale.com> <87h9lrkz56.fsf@blackfin.pond.sub.org> <56210A17.6080401@aimale.com> <87io63xpke.fsf@blackfin.pond.sub.org> <56250035.40805@aimale.com> <87twpkqyow.fsf@blackfin.pond.sub.org> <5627B458.1@aimale.com> <87lhav6s28.fsf@blackfin.pond.sub.org> From: Valerio Aimale Message-ID: <562926E1.1060201@aimale.com> Date: Thu, 22 Oct 2015 12:11:45 -0600 MIME-Version: 1.0 In-Reply-To: <87lhav6s28.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: qemu-devel@nongnu.org, ehabkost@redhat.com, lcapitulino@redhat.com On 10/22/15 5:50 AM, Markus Armbruster wrote: > Valerio Aimale writes: > >> On 10/21/15 4:54 AM, Markus Armbruster wrote: >>> Valerio Aimale writes: >>> >>>> On 10/19/15 1:52 AM, Markus Armbruster wrote: >>>>> Valerio Aimale writes: >>>>> >>>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote: >>>>>>> valerio@aimale.com writes: >>>>>>> >>>>>>>> All- >>>>>>>> >>>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to >>>>>>>> introspect QEMU/KVM VMs. >>>>>>>> >>>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree: >>>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch >>>>>>>> >>>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the >>>>>>>> commands is invoked with a string arguments (a filename), it will open >>>>>>>> a UNIX socket and spawn a listening thread. >>>>>>>> >>>>>>>> The client writes binary commands to the socket, in the form of a c >>>>>>>> structure: >>>>>>>> >>>>>>>> struct request { >>>>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved >>>>>>>> uint64_t address; // address to read from OR write to >>>>>>>> uint64_t length; // number of bytes to read OR write >>>>>>>> }; >>>>>>>> >>>>>>>> The client receives as a response, either (length+1) bytes, if it is a >>>>>>>> read operation, or 1 byte ifit is a write operation. >>>>>>>> >>>>>>>> The last bytes of a read operation response indicates success (1 >>>>>>>> success, 0 failure). The single byte returned for a write operation >>>>>>>> indicates same (1 success, 0 failure). >>>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of >>>>>>> garbage followed by the "it failed" byte? >>>>>> Markus, that appear to be the case. However, I did not write the >>>>>> communication protocol between libvmi and qemu. I'm assuming that the >>>>>> person that wrote the protocol, did not want to bother with over >>>>>> complicating things. >>>>>> >>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c >>>>>> >>>>>> I'm thinking he assumed reads would be small in size and the price of >>>>>> reading garbage was less than the price of writing a more complicated >>>>>> protocol. I can see his point, confronted with the same problem, I >>>>>> might have done the same. >>>>> All right, the interface is designed for *small* memory blocks then. >>>>> >>>>> Makes me wonder why he needs a separate binary protocol on a separate >>>>> socket. Small blocks could be done just fine in QMP. >>>> The problem is speed. if one's analyzing the memory space of a running >>>> process (physical and paged), libvmi will make a large number of small >>>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is >>>> quite significant. xp has overhead due to encoding, and pmemsave has >>>> overhead due to file open/write (server), file open/read/close/unlink >>>> (client). >>>> >>>> Others have gone through the problem before me. It appears that >>>> pmemsave and xp are significantly slower than reading memory using a >>>> socket via pmemaccess. >>> That they're slower isn't surprising, but I'd expect the cost of >>> encoding a small block to be insiginificant compared to the cost of the >>> network roundtrips. >>> >>> As block size increases, the space overhead of encoding will eventually >>> bite. But for that usage, the binary protocol appears ill-suited, >>> unless the client can pretty reliably avoid read failure. I haven't >>> examined its failure modes, yet. >>> >>>> The following data is not mine, but it shows the time, in >>>> milliseconds, required to resolve the content of a paged memory >>>> address via socket (pmemaccess) , pmemsave and xp >>>> >>>> http://cl.ly/image/322a3s0h1V05 >>>> >>>> Again, I did not produce those data points, they come from an old >>>> libvmi thread. >>> 90ms is a very long time. What exactly was measured? >> That is a fair question to ask. Unfortunately, I extracted that data >> plot from an old thread in some libvmi mailing list. I do not have the >> data and code that produced it. Sifting through the thread, I can see >> the code >> was never published. I will take it upon myself to produce code that >> compares timing - in a fair fashion - of libvmi doing an atomic >> operation and a larger-scale operation (like listing running >> processes) via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a >> version of xp that returns byte streams of memory regions base64 or >> base85 encoded in json strings. I'll publish results and code. >> >> However, given workload and life happening, it will be some time >> before I complete that task. > No problem. I'd like to have your use case addressed, but there's no > need for haste. Thanks, Markus. Appreciate your help. > > [...] >>>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with >>>>>>>> 64bit VM's >>>>>>>> >>>>>>>> in qapi-schema.json >>>>>>>> >>>>>>>> from >>>>>>>> >>>>>>>> --- >>>>>>>> { 'command': 'pmemsave', >>>>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } >>>>>>>> --- >>>>>>>> >>>>>>>> to >>>>>>>> >>>>>>>> --- >>>>>>>> { 'command': 'pmemsave', >>>>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} } >>>>>>>> --- >>>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's >>>>>>> confusing. >>>>>> I think it's confusing for the HMP parser too. If you have a VM with >>>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP >>>>>> over telnet this is what happens: >>>>>> >>>>>> $ telnet localhost 1234 >>>>>> Trying 127.0.0.1... >>>>>> Connected to localhost. >>>>>> Escape character is '^]'. >>>>>> QEMU 2.4.0.1 monitor - type 'help' for more information >>>>>> (qemu) help pmemsave >>>>>> pmemsave addr size file -- save to disk physical memory dump starting >>>>>> at 'addr' of size 'size' >>>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump" >>>>>> 'pmemsave' has failed: integer is for 32-bit values >>>>>> Try "help pmemsave" for more information >>>>>> (qemu) quit >>>>> Your change to pmemsave's definition in qapi-schema.json is effectively a >>>>> no-op. >>>>> >>>>> Your example shows *HMP* command pmemsave. The definition of an HMP >>>>> command is *independent* of the QMP command. The implementation *uses* >>>>> the QMP command. >>>>> >>>>> QMP pmemsave is defined in qapi-schema.json as >>>>> >>>>> { 'command': 'pmemsave', >>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } >>>>> >>>>> Its implementation is in cpus.c: >>>>> >>>>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename, >>>>> Error **errp) >>>>> >>>>> Note the int64_t size. >>>>> >>>>> HMP pmemsave is defined in hmp-commands.hx as >>>>> >>>>> { >>>>> .name = "pmemsave", >>>>> .args_type = "val:l,size:i,filename:s", >>>>> .params = "addr size file", >>>>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'", >>>>> .mhandler.cmd = hmp_pmemsave, >>>>> }, >>>>> >>>>> Its implementation is in hmp.c: >>>>> >>>>> void hmp_pmemsave(Monitor *mon, const QDict *qdict) >>>>> { >>>>> uint32_t size = qdict_get_int(qdict, "size"); >>>>> const char *filename = qdict_get_str(qdict, "filename"); >>>>> uint64_t addr = qdict_get_int(qdict, "val"); >>>>> Error *err = NULL; >>>>> >>>>> qmp_pmemsave(addr, size, filename, &err); >>>>> hmp_handle_error(mon, &err); >>>>> } >>>>> >>>>> Note uint32_t size. >>>>> >>>>> Arguably, the QMP size argument should use 'size' (an alias for >>>>> 'uint64'), and the HMP args_type should use 'size:o'. >>>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same >>>> way pmemsave is implemented. There is a single function, and two >>>> points of entrance, one for HMP and one for QMP. I think pmemacess >>>> mimics pmemsave closely. >>>> >>>> However, if one wants to simply dump a memory region, via HMP for >>>> human easy of use/debug/testing purposes, one cannot dump memory >>>> regions that resides higher than 2^32-1 >>> Can you give an example? >> Yes. I was trying to dump the full extent of physical memory of a VM >> that has 8GB memory space (ballooned). I simply did this: >> >> $ telnet localhost 1234 >> Trying 127.0.0.1... >> Connected to localhost. >> Escape character is '^]'. >> QEMU 2.4.0.1 monitor - type 'help' for more information >> (qemu) pmemsave 0 8589934591 "/tmp/memsaved" >> 'pmemsave' has failed: integer is for 32-bit values >> >> Maybe I misunderstood how pmemsave works. Maybe I should have used >> dump-guest-memory > This is am unnecessary limitation caused by 'size:i' instead of > 'size:o'. Fixable. I think I tried changing size:i to size:l, but, I was still receiving the error.