From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:33128)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <valerio@aimale.com>) id 1ZpKpW-0003fC-U5
	for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:43:17 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <valerio@aimale.com>) id 1ZpKpS-0001wf-Qt
	for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:43:14 -0400
Received: from smtp.aimale.com ([166.78.138.199]:50479)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <valerio@aimale.com>) id 1ZpKpS-0001wQ-Iq
	for qemu-devel@nongnu.org; Thu, 22 Oct 2015 14:43:10 -0400
References: <1444952643-5033-1-git-send-email-valerio@aimale.com>
	<87h9lrkz56.fsf@blackfin.pond.sub.org> <56210A17.6080401@aimale.com>
	<87io63xpke.fsf@blackfin.pond.sub.org> <56250035.40805@aimale.com>
	<87twpkqyow.fsf@blackfin.pond.sub.org>
From: Valerio Aimale <valerio@aimale.com>
Message-ID: <56292E3A.2010603@aimale.com>
Date: Thu, 22 Oct 2015 12:43:06 -0600
MIME-Version: 1.0
In-Reply-To: <87twpkqyow.fsf@blackfin.pond.sub.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: lcapitulino@redhat.com, qemu-devel@nongnu.org, ehabkost@redhat.com

On 10/21/15 4:54 AM, Markus Armbruster wrote:
> Valerio Aimale <valerio@aimale.com> writes:
>
>> On 10/19/15 1:52 AM, Markus Armbruster wrote:
>>> Valerio Aimale <valerio@aimale.com> writes:
>>>
>>>> On 10/16/15 2:15 AM, Markus Armbruster wrote:
>>>>> valerio@aimale.com writes:
>>>>>
>>>>>> All-
>>>>>>
>>>>>> I've produced a patch for the current QEMU HEAD, for libvmi to
>>>>>> introspect QEMU/KVM VMs.
>>>>>>
>>>>>> Libvmi has patches for the old qeum-kvm fork, inside its source tree:
>>>>>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>>>>>
>>>>>> This patch adds a hmp and a qmp command, "pmemaccess". When the
>>>>>> commands is invoked with a string arguments (a filename), it will open
>>>>>> a UNIX socket and spawn a listening thread.
>>>>>>
>>>>>> The client writes binary commands to the socket, in the form of a c
>>>>>> structure:
>>>>>>
>>>>>> struct request {
>>>>>>         uint8_t type;   // 0 quit, 1 read, 2 write, ... rest reserved
>>>>>>         uint64_t address;   // address to read from OR write to
>>>>>>         uint64_t length;    // number of bytes to read OR write
>>>>>> };
>>>>>>
>>>>>> The client receives as a response, either (length+1) bytes, if it is a
>>>>>> read operation, or 1 byte ifit is a write operation.
>>>>>>
>>>>>> The last bytes of a read operation response indicates success (1
>>>>>> success, 0 failure). The single byte returned for a write operation
>>>>>> indicates same (1 success, 0 failure).
>>>>> So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of
>>>>> garbage followed by the "it failed" byte?
>>>> Markus, that appear to be the case. However, I did not write the
>>>> communication protocol between libvmi and qemu. I'm assuming that the
>>>> person that wrote the protocol, did not want to bother with over
>>>> complicating things.
>>>>
>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>
>>>> I'm thinking he assumed reads would be small in size and the price of
>>>> reading garbage was less than the price of writing a more complicated
>>>> protocol. I can see his point, confronted with the same problem, I
>>>> might have done the same.
>>> All right, the interface is designed for *small* memory blocks then.
>>>
>>> Makes me wonder why he needs a separate binary protocol on a separate
>>> socket.  Small blocks could be done just fine in QMP.
>> The problem is speed. if one's analyzing the memory space of a running
>> process (physical and paged), libvmi will make a large number of small
>> and mid-sized reads. If one uses xp, or pmemsave, the overhead is
>> quite significant. xp has overhead due to encoding, and pmemsave has
>> overhead due to file open/write (server), file open/read/close/unlink
>> (client).
>>
>> Others have gone through the problem before me. It appears that
>> pmemsave and xp are significantly slower than reading memory using a
>> socket via pmemaccess.
> That they're slower isn't surprising, but I'd expect the cost of
> encoding a small block to be insiginificant compared to the cost of the
> network roundtrips.
>
> As block size increases, the space overhead of encoding will eventually
> bite.  But for that usage, the binary protocol appears ill-suited,
> unless the client can pretty reliably avoid read failure.  I haven't
> examined its failure modes, yet.
>
>> The following data is not mine, but it shows the time, in
>> milliseconds, required to resolve the content of a paged memory
>> address via socket (pmemaccess) , pmemsave and xp
>>
>> http://cl.ly/image/322a3s0h1V05
>>
>> Again, I did not produce those data points, they come from an old
>> libvmi thread.
> 90ms is a very long time.  What exactly was measured?
>
>> I think it might be conceivable that there could be a QMP command that
>> returns the content of an arbitrarily size memory region as a base64
>> or a base85 json string. It would still have both time- (due to
>> encoding/decoding) and space- (base64 has 33% and ase85 would be 7%)
>> overhead, + json encoding/decoding overhead. It might still be the
>> case that socket would outperform such a command as well,
>> speed-vise. I don't think it would be any faster than xp.
> A special-purpose binary protocol over a dedicated socket will always do
> less than a QMP solution (ignoring foolishness like transmitting crap on
> read error the client is then expected to throw away).  The question is
> whether the difference in work translates to a worthwhile difference in
> performance.
>
> The larger question is actually whether we have an existing interface
> that can serve the libvmi's needs.  We've discussed monitor commands
> like xp, pmemsave, pmemread.  There's another existing interface: the
> GDB stub.  Have you considered it?
>
>> There's also a similar patch, floating around the internet, the uses
>> shared memory, instead of sockets, as inter-process communication
>> between libvmi and QEMU. I've never used that.
> By the time you built a working IPC mechanism on top of shared memory,
> you're often no better off than with AF_LOCAL sockets.
>
> Crazy idea: can we allocate guest memory in a way that support sharing
> it with another process?  Eduardo, can -mem-path do such wild things?
Markus, your suggestion led to a lightbulb going off in my head.

What if there was a qmp command, say 'pmemmap' then when invoked, 
performs the following:

qmp_pmemmap( [...]) {

     char *template = "/tmp/QEM_mmap_XXXXXXX";
     int mmap_fd;
     uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming 
VM with 8GB RAM */);

     cpu_physical_memory_rw( (hwaddr) 0,  local_memspace , (hwaddr) 
8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will 
discuss write later */);

    mmap_fd = mkstemp("/tmp/QEUM_mmap_XXXXXXX");

    mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | 
PROT_WRITE,  MAP_SHARED | MAP_ANON,  mmap_fd, (off_t) 0);

   /* etc */

}

pmemmap would return the following json

{
     'success' : 'true',
     'map_filename' : '/tmp/QEM_mmap_1234567'
}

the qmp client/caller, would then allocate a region of memory equal to 
the size of the file; mmap() the file '/tmp/QEM_mmap_1234567' into the 
region. It would then have (read, maybe write?) access to the full 
extent of the guest memory without making any other qmp call. I think it 
would be fast, and with low memory usage, as mmap() is pretty efficient.

Of course, there would be a 'pmemunmap' qmp commands that would perform 
the cleanup

/* etc. */
munmap()
cpu_physical_memory_unmap();
  /* etc. */

Would that work? Is mapping the full extent of the guest RAM too much to 
ask of cpu_physical_memory_rw()?
>
>>>>>> The socket API was written by the libvmi author and it works the with
>>>>>> current libvmi version. The libvmi client-side implementation is at:
>>>>>>
>>>>>> https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c
>>>>>>
>>>>>> As many use kvm VM's for introspection, malware and security analysis,
>>>>>> it might be worth thinking about making the pmemaccess a permanent
>>>>>> hmp/qmp command, as opposed to having to produce a patch at each QEMU
>>>>>> point release.
>>>>> Related existing commands: memsave, pmemsave, dump-guest-memory.
>>>>>
>>>>> Can you explain why these won't do for your use case?
>>>> For people who do security analysis there are two use cases, static
>>>> and dynamic analysis. With memsave, pmemsave and dum-guest-memory one
>>>> can do static analysis. I.e. snapshotting a VM and see what was
>>>> happening at that point in time.
>>>> Dynamic analysis require to be able to 'introspect' a VM while it's running.
>>>>
>>>> If you take a snapshot of two people exchanging a glass of water, and
>>>> you happen to take it at the very moment both persons have their hands
>>>> on the glass, it's hard to tell who passed the glass to whom. If you
>>>> have a movie of the same scene, it's obvious who's the giver and who's
>>>> the receiver. Same use case.
>>> I understand the need for introspecting a running guest.  What exactly
>>> makes the existing commands unsuitable for that?
>> Speed. See discussion above.
>>>> More to the point, there's a host of C and python frameworks to
>>>> dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all
>>>> build on top of libvmi. I did not want to reinvent the wheel.
>>> Fair enough.
>>>
>>> Front page http://libvmi.com/ claims "Works with Xen, KVM, Qemu, and Raw
>>> memory files."  What exactly is missing for KVM?
>> When they say they support kvm, what they really mean they support the
>> (retired, I understand) qemu-kvm fork via a patch that is provided in
>> the libvmi source tree. I think the most recent qem-kvm supported is
>> 1.6.0
>>
>> https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch
>>
>> I wanted to bring support to the head revision of QEMU, to bring
>> libvmi level with more modern QEMU.
>>
>> Maybe the solution is simply to put this patch in the libvmi source
>> tree,  which I've already asked to do via pull request, leaving QEMU
>> alone.
>> However, the patch has to be updated at every QEMU point release. I
>> wanted to avoid that, if at all possible.
>>
>>>> Mind you, 99.9% of people that do dynamic VM analysis use xen. They
>>>> contend that xen has better introspection support. In my case, I did
>>>> not want to bother with dedicating a full server to be a xen domain
>>>> 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in
>>>> an otherwise purposed server.
>>> I'm not at all against better introspection support in QEMU.  I'm just
>>> trying to understand the problem you're trying to solve with your
>>> patches.
>> What all users of libvmi would love to have is super high speed access
>> to VM physical memory as part of the QEMU source tree, and not
>> supported via a patch. Implemented as the QEMU owners see fit, as long
>> as it is blazing fast and easy accessed via client library or
>> inter-process communication.
> The use case makes sense to me, we just need to figure out how we want
> to serve it in QEMU.
>
>> My gut feeling is that it has to bypass QMP protocol/encoding/file
>> access/json to be fast, but, it is just a gut feeling - worth nothing.
> My gut feeling is that QMP should do fine in overhead compared to other
> solutions involving socket I/O as long as the data sizes are *small*.
> Latency might be an issue, though: QMP commands are processed from the
> main loop.  A dedicated server thread can be more responsive, but
> letting it write to shared resources could be "interesting".
>
>>>>>> Also, the pmemsave commands QAPI should be changed to be usable with
>>>>>> 64bit VM's
>>>>>>
>>>>>> in qapi-schema.json
>>>>>>
>>>>>> from
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>>      'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>>>> ---
>>>>>>
>>>>>> to
>>>>>>
>>>>>> ---
>>>>>> { 'command': 'pmemsave',
>>>>>>      'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} }
>>>>>> ---
>>>>> In the QAPI schema, 'int' is actually an alias for 'int64'.  Yes, that's
>>>>> confusing.
>>>> I think it's confusing for the HMP parser too. If you have a VM with
>>>> 8Gb of RAM and want to snapshot the whole physical memory, via HMP
>>>> over telnet this is what happens:
>>>>
>>>> $ telnet localhost 1234
>>>> Trying 127.0.0.1...
>>>> Connected to localhost.
>>>> Escape character is '^]'.
>>>> QEMU 2.4.0.1 monitor - type 'help' for more information
>>>> (qemu) help pmemsave
>>>> pmemsave addr size file -- save to disk physical memory dump starting
>>>> at 'addr' of size 'size'
>>>> (qemu) pmemsave 0 8589934591 "/tmp/memorydump"
>>>> 'pmemsave' has failed: integer is for 32-bit values
>>>> Try "help pmemsave" for more information
>>>> (qemu) quit
>>> Your change to pmemsave's definition in qapi-schema.json is effectively a
>>> no-op.
>>>
>>> Your example shows *HMP* command pmemsave.  The definition of an HMP
>>> command is *independent* of the QMP command.  The implementation *uses*
>>> the QMP command.
>>>
>>> QMP pmemsave is defined in qapi-schema.json as
>>>
>>>       { 'command': 'pmemsave',
>>>         'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>>
>>> Its implementation is in cpus.c:
>>>
>>>       void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
>>>                         Error **errp)
>>>
>>> Note the int64_t size.
>>>
>>> HMP pmemsave is defined in hmp-commands.hx as
>>>
>>>       {
>>>           .name       = "pmemsave",
>>>           .args_type  = "val:l,size:i,filename:s",
>>>           .params     = "addr size file",
>>>           .help       = "save to disk physical memory dump starting at 'addr' of size 'size'",
>>>           .mhandler.cmd = hmp_pmemsave,
>>>       },
>>>
>>> Its implementation is in hmp.c:
>>>
>>>       void hmp_pmemsave(Monitor *mon, const QDict *qdict)
>>>       {
>>>           uint32_t size = qdict_get_int(qdict, "size");
>>>           const char *filename = qdict_get_str(qdict, "filename");
>>>           uint64_t addr = qdict_get_int(qdict, "val");
>>>           Error *err = NULL;
>>>
>>>           qmp_pmemsave(addr, size, filename, &err);
>>>           hmp_handle_error(mon, &err);
>>>       }
>>>
>>> Note uint32_t size.
>>>
>>> Arguably, the QMP size argument should use 'size' (an alias for
>>> 'uint64'), and the HMP args_type should use 'size:o'.
>> Understand all that. Indeed, I've re-implemented 'pmemaccess' the same
>> way pmemsave is implemented. There is a single function, and two
>> points of entrance, one for HMP and one for QMP. I think pmemacess
>> mimics pmemsave closely.
>>
>> However, if one wants to simply dump a memory region, via HMP for
>> human easy of use/debug/testing purposes, one cannot dump memory
>> regions that resides higher than 2^32-1
> Can you give an example?
>
> [...]