From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:55357)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <liorvern@gmail.com>) id 1UgTgv-0005gT-17
	for qemu-devel@nongnu.org; Sun, 26 May 2013 01:40:30 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <liorvern@gmail.com>) id 1UgTgs-00078u-7U
	for qemu-devel@nongnu.org; Sun, 26 May 2013 01:40:24 -0400
Received: from mail-wi0-x234.google.com ([2a00:1450:400c:c05::234]:42866)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <liorvern@gmail.com>) id 1UgTgs-00078j-1O
	for qemu-devel@nongnu.org; Sun, 26 May 2013 01:40:22 -0400
Received: by mail-wi0-f180.google.com with SMTP id hn14so708261wib.13
	for <qemu-devel@nongnu.org>; Sat, 25 May 2013 22:40:20 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <51A10BCA.6000800@suse.de>
References: <CALBwSP0u6MxDH6Adt3XV_iVf7hqvqYFw8nWAUGz12iXdGGu9Cw@mail.gmail.com>
	<51A10BCA.6000800@suse.de>
Date: Sun, 26 May 2013 08:40:20 +0300
Message-ID: <CALBwSP39qnyENkvj4PtySHYnfVc7tkc6wM0nAxDsw5AtiEG=FA@mail.gmail.com>
From: Lior Vernia <liorvern@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] Potential to accelerate QEMU for specific
	architectures
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?ISO-8859-1?Q?Andreas_F=E4rber?= <afaerber@suse.de>
Cc: qemu-devel@nongnu.org, =?UTF-8?B?6Zmz6Z+L5Lu7?= <chenwj@iis.sinica.edu.tw>, Richard Henderson <rth@twiddle.net>

Hello,

On Sat, May 25, 2013 at 10:06 PM, Andreas F=E4rber <afaerber@suse.de> wrote=
:
> Hi,
>
> Am 24.05.2013 21:24, schrieb Lior Vernia:
>> I am running x86 applications on an ARM device using QEMU, and found
>> it too slow for my needs.
>
> Before we start going into technical details, what are you trying to
> achieve on a high level and how did you try to do it?
>
> Are you using qemu-system-x86_64 or qemu-x86_64? The latest v1.5.0?

Sorry, right after I wrote the message it occured to me I should have
mentioned that I was talking about qemu-system, either x86 or i386. At
the moment I just ran the limbo app on a Galaxy SIII with various
images, just to see the capabilities, and was disappointed. Limbo
seems to run v1.1.0.

If you suspect that it's the JNI wrapping that's causing a lot of the
damage, then we can talk about compiling QEMU for ARM and running it
natively, I just haven't been able to get that to work.

>> This is to be expected, of course, this is
>> not a complaint.
>
> Especially since most people still run on x86 ...
>
>> However, I was wondering whether this could be helped
>> by "overriding" the generic binary translation mechanism and focusing
>> on lower level binary translation just from x86 to ARM.
>>
>> It's clear to me that this isn't a small project, but it might be
>> important enough for me to invest myself in. However, before I jump
>> into it, I wanted to inquire whether this would be worthwhile at all.
>> Does anyone have any estimate as to how big of a gain that could
>> achieve? Or whether a more significant improvement could be achieved
>> by further tweaking that didn't occur to me?

I wanted to add that I've been reading about this Russian startup
that's looking to emulate x86 on ARM at 40% of native speed using
dynamic binary translation (as far as I gather):
http://www.bit-tech.net/news/hardware/2012/10/04/x86-on-arm/1
So this should be possible. And it can't be very much unlike QEMU, can it?

>
> ... the tcg/arm/ code does not get a lot of love, so you might be able
> to squeeze some more performance out of it by implementing optional TCG
> ops or optimizing existing implementations. In theory most TCG ops
> should correspond to a machine instruction (where available); there's a
> TCG-level optimizer to create more efficient code, but it's a tradeoff
> between time for code optimization and execution time.
>
> Needless to say that you should enable -O3 optimization (or something)
> for the core C code and not to enable debug features in configure for
> your performance measurements. :)
>
> Whatever implementation you experiment with, get familiar with our
> Git-based workflow and try to stay close to qemu.git code or otherwise
> you'll create a fork with little chance of getting integrated into the
> code base - meaning both we don't get your speedups and you don't get
> our latest features and bugfixes. One such example was the attempt to
> use LLVM instead of TCG.

Thanks, but we're getting slightly ahead of ourselves here :) I'd
still want to make sure that QEMU is at fault for the performance, and
if that's the case that there's potential for real improvement before
I start getting my hands dirty .