From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56378)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ale@clearmind.me>) id 1cB5BE-0005NF-GT
	for qemu-devel@nongnu.org; Sun, 27 Nov 2016 14:32:05 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ale@clearmind.me>) id 1cB5BA-00033l-Td
	for qemu-devel@nongnu.org; Sun, 27 Nov 2016 14:32:04 -0500
Received: from clearmind.me ([178.32.49.9]:52997)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <ale@clearmind.me>) id 1cB5BA-00033Y-Mm
	for qemu-devel@nongnu.org; Sun, 27 Nov 2016 14:32:00 -0500
Date: Sun, 27 Nov 2016 20:32:44 +0100
From: Alessandro Di Federico <ale+qemu@clearmind.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Message-Id: <E1cB5av-0004MY-3h@clearmind.me>
Sender: ale@clearmind.me
Subject: [Qemu-devel] Support for using TCG frontend as a library
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Yan <zardus@gmail.com>, "Jonas Zaddach (jzaddach)" <jzaddach@cisco.com>

Hi all,
  QEMU is a great emulator, but in recent years it has also been used
for instrumentation purposes [QIRA,AFL] or as a lifter for static
analysis purposes [rev.ng,angr,libqemu,S=C2=B2E]. I'd like to hear your
take on the second use case, and the possibility of offering upstream
support for it.

The general idea is to introduce a new build configuration which
produces a library for each supported input ISA exposing the TCG
frontend in a unified way. We could call it libtcg-$ARCH.so. In
practice, given a buffer containing code for a certain architecture,
the user program loads the appropriate version of this library and asks
it to produce the corresponding TCG instructions.

I've been investigating the needs of the various projects that might be
interested in using it and they sum up to the following:

* Be able to load in the same process multiple libtcg-$ARCH.so for
  different architectures.
* Obtain the TCG instructions from code in a memory buffer.
* Dump the assembly code of the code in a memory buffer.
* Dump the TCG instructions in textual form.

For what concerns helpers, it would be nice to have some metadata about
them, for instance the parts of the CPU state they can change. It would
also be nice to have a build configuration which produces a library
containing all the helpers ready to be used, or, even better, a library
as LLVM bitcode, which can then be further processed/analyzed.

Here you can find some relevant parts of my draft implementation part
of rev.ng:

* The interface exposed to users:
  https://polimicg.org/gitlab/revng/qemu/blob/develop/linux-user/ptc.h
* Implementation of the interface functions:
  https://polimicg.org/gitlab/revng/qemu/blob/develop/linux-user/ptc.c
* For the changes introduced elsewhere look for CONFIG_LIBTINYCODE:
  https://polimicg.org/gitlab/search?utf8=3D%E2%9C%93&search=3DCONFIG_LIBTI=
NYCODE&group_id=3D&project_id=3D83&search_code=3Dtrue&repository_ref=3Ddeve=
lop

It's rough but it works (see [rev.ng]). I'm interested to hear your
opinion and willingness to take patches. Being able to unify the
various efforts in this direction would be good, having upstream
support would be amazing.

--
Alessandro Di Federico
PhD student at Politecnico di Milano

[QIRA] http://qira.me/
[AFL] http://lcamtuf.coredump.cx/afl/ (for the black-box mode)
[rev.ng] https://rev.ng/
[angr] http://angr.io/ (currently using VEX IR, QEMU support planned)
[libqemu] https://github.com/zaddach/libqemu
[S=C2=B2E] http://s2e.epfl.ch/