From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33128) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpKpW-0003fC-U5 for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:43:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZpKpS-0001wf-Qt for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:43:14 -0400 Received: from smtp.aimale.com ([166.78.138.199]:50479) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZpKpS-0001wQ-Iq for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:43:10 -0400 References: <1444952643-5033-1-git-send-email-valerio@aimale.com> <87h9lrkz56.fsf@blackfin.pond.sub.org> <56210A17.6080401@aimale.com> <87io63xpke.fsf@blackfin.pond.sub.org> <56250035.40805@aimale.com> <87twpkqyow.fsf@blackfin.pond.sub.org> From: Valerio Aimale Message-ID: <56292E3A.2010603@aimale.com> Date: Thu, 22 Oct 2015 12:43:06 -0600 MIME-Version: 1.0 In-Reply-To: <87twpkqyow.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: lcapitulino@redhat.com, qemu-devel@nongnu.org, ehabkost@redhat.com On 10/21/15 4:54 AM, Markus Armbruster wrote: > Valerio Aimale writes: > >> On 10/19/15 1:52 AM, Markus Armbruster wrote: >>> Valerio Aimale writes: >>> >>>> On 10/16/15 2:15 AM, Markus Armbruster wrote: >>>>> valerio@aimale.com writes: >>>>> >>>>>> All- >>>>>> >>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to >>>>>> introspect QEMU/KVM VMs. >>>>>> >>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree: >>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch >>>>>> >>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the >>>>>> commands is invoked with a string arguments (a filename), it will open >>>>>> a UNIX socket and spawn a listening thread. >>>>>> >>>>>> The client writes binary commands to the socket, in the form of a c >>>>>> structure: >>>>>> >>>>>> struct request { >>>>>> uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved >>>>>> uint64_t address; // address to read from OR write to >>>>>> uint64_t length; // number of bytes to read OR write >>>>>> }; >>>>>> >>>>>> The client receives as a response, either (length+1) bytes, if it is a >>>>>> read operation, or 1 byte ifit is a write operation. >>>>>> >>>>>> The last bytes of a read operation response indicates success (1 >>>>>> success, 0 failure). The single byte returned for a write operation >>>>>> indicates same (1 success, 0 failure). >>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of >>>>> garbage followed by the "it failed" byte? >>>> Markus, that appear to be the case. However, I did not write the >>>> communication protocol between libvmi and qemu. I'm assuming that the >>>> person that wrote the protocol, did not want to bother with over >>>> complicating things. >>>> >>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c >>>> >>>> I'm thinking he assumed reads would be small in size and the price of >>>> reading garbage was less than the price of writing a more complicated >>>> protocol. I can see his point, confronted with the same problem, I >>>> might have done the same. >>> All right, the interface is designed for *small* memory blocks then. >>> >>> Makes me wonder why he needs a separate binary protocol on a separate >>> socket. Small blocks could be done just fine in QMP. >> The problem is speed. if one's analyzing the memory space of a running >> process (physical and paged), libvmi will make a large number of small >> and mid-sized reads. If one uses xp, or pmemsave, the overhead is >> quite significant. xp has overhead due to encoding, and pmemsave has >> overhead due to file open/write (server), file open/read/close/unlink >> (client). >> >> Others have gone through the problem before me. It appears that >> pmemsave and xp are significantly slower than reading memory using a >> socket via pmemaccess. > That they're slower isn't surprising, but I'd expect the cost of > encoding a small block to be insiginificant compared to the cost of the > network roundtrips. > > As block size increases, the space overhead of encoding will eventually > bite. But for that usage, the binary protocol appears ill-suited, > unless the client can pretty reliably avoid read failure. I haven't > examined its failure modes, yet. > >> The following data is not mine, but it shows the time, in >> milliseconds, required to resolve the content of a paged memory >> address via socket (pmemaccess) , pmemsave and xp >> >> http://cl.ly/image/322a3s0h1V05 >> >> Again, I did not produce those data points, they come from an old >> libvmi thread. > 90ms is a very long time. What exactly was measured? > >> I think it might be conceivable that there could be a QMP command that >> returns the content of an arbitrarily size memory region as a base64 >> or a base85 json string. It would still have both time- (due to >> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%) >> overhead, + json encoding/decoding overhead. It might still be the >> case that socket would outperform such a command as well, >> speed-vise. I don't think it would be any faster than xp. > A special-purpose binary protocol over a dedicated socket will always do > less than a QMP solution (ignoring foolishness like transmitting crap on > read error the client is then expected to throw away). The question is > whether the difference in work translates to a worthwhile difference in > performance. > > The larger question is actually whether we have an existing interface > that can serve the libvmi's needs. We've discussed monitor commands > like xp, pmemsave, pmemread. There's another existing interface: the > GDB stub. Have you considered it? > >> There's also a similar patch, floating around the internet, the uses >> shared memory, instead of sockets, as inter-process communication >> between libvmi and QEMU. I've never used that. > By the time you built a working IPC mechanism on top of shared memory, > you're often no better off than with AF_LOCAL sockets. > > Crazy idea: can we allocate guest memory in a way that support sharing > it with another process? Eduardo, can -mem-path do such wild things? Markus, your suggestion led to a lightbulb going off in my head. What if there was a qmp command, say 'pmemmap' then when invoked, performs the following: qmp_pmemmap( [...]) { char *template = "/tmp/QEM_mmap_XXXXXXX"; int mmap_fd; uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming VM with 8GB RAM */); cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr) 8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will discuss write later */); mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX"); mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0); /* etc */ } pmemmap would return the following json { 'success' : 'true', 'map_filename' : '/tmp/QEM_mmap_1234567' } the qmp client/caller, would then allocate a region of memory equal to the size of the file; mmap() the file '/tmp/QEM_mmap_1234567' into the region. It would then have (read, maybe write?) access to the full extent of the guest memory without making any other qmp call. I think it would be fast, and with low memory usage, as mmap() is pretty efficient. Of course, there would be a 'pmemunmap' qmp commands that would perform the cleanup /* etc. */ munmap() cpu_physical_memory_unmap(); /* etc. */ Would that work? Is mapping the full extent of the guest RAM too much to ask of cpu_physical_memory_rw()? > >>>>>> The socket API was written by the libvmi author and it works the with >>>>>> current libvmi version. The libvmi client-side implementation is at: >>>>>> >>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c >>>>>> >>>>>> As many use kvm VM's for introspection, malware and security analysis, >>>>>> it might be worth thinking about making the pmemaccess a permanent >>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU >>>>>> point release. >>>>> Related existing commands: memsave, pmemsave, dump-guest-memory. >>>>> >>>>> Can you explain why these won't do for your use case? >>>> For people who do security analysis there are two use cases, static >>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one >>>> can do static analysis. I.e. snapshotting a VM and see what was >>>> happening at that point in time. >>>> Dynamic analysis require to be able to 'introspect' a VM while it's running. >>>> >>>> If you take a snapshot of two people exchanging a glass of water, and >>>> you happen to take it at the very moment both persons have their hands >>>> on the glass, it's hard to tell who passed the glass to whom. If you >>>> have a movie of the same scene, it's obvious who's the giver and who's >>>> the receiver. Same use case. >>> I understand the need for introspecting a running guest. What exactly >>> makes the existing commands unsuitable for that? >> Speed. See discussion above. >>>> More to the point, there's a host of C and python frameworks to >>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all >>>> build on top of libvmi. I did not want to reinvent the wheel. >>> Fair enough. >>> >>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw >>> memory files." What exactly is missing for KVM? >> When they say they support kvm, what they really mean they support the >> (retired, I understand) qemu-kvm fork via a patch that is provided in >> the libvmi source tree. I think the most recent qem-kvm supported is >> 1.6.0 >> >> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch >> >> I wanted to bring support to the head revision of QEMU, to bring >> libvmi level with more modern QEMU. >> >> Maybe the solution is simply to put this patch in the libvmi source >> tree, which I've already asked to do via pull request, leaving QEMU >> alone. >> However, the patch has to be updated at every QEMU point release. I >> wanted to avoid that, if at all possible. >> >>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They >>>> contend that xen has better introspection support. In my case, I did >>>> not want to bother with dedicating a full server to be a xen domain >>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in >>>> an otherwise purposed server. >>> I'm not at all against better introspection support in QEMU. I'm just >>> trying to understand the problem you're trying to solve with your >>> patches. >> What all users of libvmi would love to have is super high speed access >> to VM physical memory as part of the QEMU source tree, and not >> supported via a patch. Implemented as the QEMU owners see fit, as long >> as it is blazing fast and easy accessed via client library or >> inter-process communication. > The use case makes sense to me, we just need to figure out how we want > to serve it in QEMU. > >> My gut feeling is that it has to bypass QMP protocol/encoding/file >> access/json to be fast, but, it is just a gut feeling - worth nothing. > My gut feeling is that QMP should do fine in overhead compared to other > solutions involving socket I/O as long as the data sizes are *small*. > Latency might be an issue, though: QMP commands are processed from the > main loop. A dedicated server thread can be more responsive, but > letting it write to shared resources could be "interesting". > >>>>>> Also, the pmemsave commands QAPI should be changed to be usable with >>>>>> 64bit VM's >>>>>> >>>>>> in qapi-schema.json >>>>>> >>>>>> from >>>>>> >>>>>> --- >>>>>> { 'command': 'pmemsave', >>>>>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } >>>>>> --- >>>>>> >>>>>> to >>>>>> >>>>>> --- >>>>>> { 'command': 'pmemsave', >>>>>> 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} } >>>>>> --- >>>>> In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's >>>>> confusing. >>>> I think it's confusing for the HMP parser too. If you have a VM with >>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP >>>> over telnet this is what happens: >>>> >>>> $ telnet localhost 1234 >>>> Trying 127.0.0.1... >>>> Connected to localhost. >>>> Escape character is '^]'. >>>> QEMU 2.4.0.1 monitor - type 'help' for more information >>>> (qemu) help pmemsave >>>> pmemsave addr size file -- save to disk physical memory dump starting >>>> at 'addr' of size 'size' >>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump" >>>> 'pmemsave' has failed: integer is for 32-bit values >>>> Try "help pmemsave" for more information >>>> (qemu) quit >>> Your change to pmemsave's definition in qapi-schema.json is effectively a >>> no-op. >>> >>> Your example shows *HMP* command pmemsave. The definition of an HMP >>> command is *independent* of the QMP command. The implementation *uses* >>> the QMP command. >>> >>> QMP pmemsave is defined in qapi-schema.json as >>> >>> { 'command': 'pmemsave', >>> 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } >>> >>> Its implementation is in cpus.c: >>> >>> void qmp_pmemsave(int64_t addr, int64_t size, const char *filename, >>> Error **errp) >>> >>> Note the int64_t size. >>> >>> HMP pmemsave is defined in hmp-commands.hx as >>> >>> { >>> .name = "pmemsave", >>> .args_type = "val:l,size:i,filename:s", >>> .params = "addr size file", >>> .help = "save to disk physical memory dump starting at 'addr' of size 'size'", >>> .mhandler.cmd = hmp_pmemsave, >>> }, >>> >>> Its implementation is in hmp.c: >>> >>> void hmp_pmemsave(Monitor *mon, const QDict *qdict) >>> { >>> uint32_t size = qdict_get_int(qdict, "size"); >>> const char *filename = qdict_get_str(qdict, "filename"); >>> uint64_t addr = qdict_get_int(qdict, "val"); >>> Error *err = NULL; >>> >>> qmp_pmemsave(addr, size, filename, &err); >>> hmp_handle_error(mon, &err); >>> } >>> >>> Note uint32_t size. >>> >>> Arguably, the QMP size argument should use 'size' (an alias for >>> 'uint64'), and the HMP args_type should use 'size:o'. >> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same >> way pmemsave is implemented. There is a single function, and two >> points of entrance, one for HMP and one for QMP. I think pmemacess >> mimics pmemsave closely. >> >> However, if one wants to simply dump a memory region, via HMP for >> human easy of use/debug/testing purposes, one cannot dump memory >> regions that resides higher than 2^32-1 > Can you give an example? > > [...]